20
WHITE PAPER H3 TRENDS IN AI ALGORITHMS: THE INFOSYS WAY Abstract Artificial Intelligence algorithms are the wheels of AI. To make the art of possible applications of AI, a very good and deep understanding of these algorithms is required. This paper tries to bring a perspective on the landscape of various AI algorithms that will be shaping key advancements across industries.

H3 Trends in AI Algorithms: The Infosys Way...• Sentiment Analysis Figure 1.0: ... from use cases such as language translations, sentence formulation, text summarization, topic extraction

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: H3 Trends in AI Algorithms: The Infosys Way...• Sentiment Analysis Figure 1.0: ... from use cases such as language translations, sentence formulation, text summarization, topic extraction

WHITE PAPER

H3 TRENDS IN AI ALGORITHMS THE INFOSYS WAY

Abstract

Artificial Intelligence algorithms are the wheels of AI To make the art of possible applications of AI a very good and deep understanding of these algorithms is required This paper tries to bring a perspective on the landscape of various AI algorithms that will be shaping key advancements across industries

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

Today technology adoption is influenced

by business and technology uncertainties

These uncertainties drive organisations

to evaluate technology adoptions based

on risks and returns Broadly technology

led disruptions can be classified into

Horizon1 Horizon 2 and Horizon 3 Horizon

1 or H1 technologies are those that are

in mainstream client adoptions and have

steady business transactions while H2

and H3 are those that are yet to become

mainstream but have started to spring

interesting possibilities and potential

returns in the future

At Infosys Center for Emerging

Technology solutions (iCETS) we

continuously look at H2 and H3

technologies and their impact on client

landscapes These H2 and H3 technologies

are very important to be monitored as they

have the potential to transform or disrupt

existing well-oiled business models hence

fetching large returns However there

are also associated risks from adoptions

that need to be monitored as some of

those can have higher negative impact on

compliance safety and so on

With the emergence and availability of

several open datasets computational

thrust with GPU availability and maturity

of Artificial Intelligence (AI) algorithms AI

is making strong inroads into current and

future of IT ecosystems Today AI plays

an integral role in IT strategy by driving

new experiences and creating new art of

possibilities In this paper we try to look

at important AI algorithms that are

shaping various H3 of AI possibilities

While we do that here is a chart

representing the broader AI algorithm

landscape in the context of this paper

Business Uncertainty

Tech

nolo

gy U

ncer

tain

ty

Fly

Run

Crawl a

nd Wal

k

Emerg

ing Oerings

Core

O

erings

New

O

erings

DierentiateDiversify Deploy

AdoptScale

Enhance

EnvisionInvent

Disrupt

Emerging InvestmentOpportunities

Algorithms Use Cases

Incubated to New Oerings

Main stream

bull Explainable AIbull Generative Networksbull Fine Grained Classicationbull Capsule Networksbull Meta Learningbull Transfer Learning (Text)bull Single Shot Learningbull Reinforcement Learningbull Auto MLbull Neural Architecture Search (NAS)

bull Scene Captioningbull Scene detectionbull Store Footfall countsbull Specic object class detectionbull Sentence Completionbull Video Scene predictionbull Auto learningbull Fake images Art generationbull Music generationbull Data Augmentation

bull Convolution Neural Networks (CNN)bull Long Term Short Term Memory

(LSTM)bull Recurrent Neural Networks (RNN)bull Word2Vecbull GloVebull Transfer Learning (Vision)

bull Object detectionbull Product brand recognition

classicationbull Facial Recognitionbull Speech Recognitionbull Speech Transcriptionsbull Topic Classication extraction

bull Logistic Regressionbull Naive Bayesbull Random Forestbull Support Vector Machines (SVM)bull Collaborative Filteringbull n-grams

bull Recommendationsbull Predictionbull Document image

classicationbull Document image Clusteringbull Sentiment Analysis

External Document copy 2019 Infosys Limited

Figure 10 Horizon 3 AI Algorithms- Infosys Research

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

H1 of AI ldquocore offeringsrdquo are typically defined as algorithm powered use cases that have become mainstream and will remain major investment areas for the current wave In that respect adoption of use cases such as product or customer recommendations churn and sentiment analysis and leveraging algorithms such as Random Forest Support Vector Machines (SVM) Naiumlve Bayes and n-grams based approaches have been mainstream for some time and will continue to get weaved into varied AI experiences

H2 of AI ldquonew offeringsrdquo use cases are the ones that are currently in experimentative evolutionary mode and will have major impact on Artificial Intelligent systems

that will be mainstream in second wave Convolution Neural Networks (CNN) have laid foundation for several art of possible Computer Vision use cases ranging from object detection image captioning and segmentation to facial recognition Long Term Short Term Memory(LSTM) and Recurrent Neural Nets (RNN) are helping to significantly improve art of possibilities from use cases such as language translations sentence formulation text summarization topic extraction and so on Word vectors based models such as GloVe and Word2Vec are helping in dealing with large multi dimensional text corpuses and finding hidden unspotted complex interwoven relationships and similarity between topics entities and keywords

These H2 AI algorithms are promising interesting new possibilities in various business functions however it is still in nascent stage of adoption and user testing

H3 of AI ldquoemerging offeringsrdquo use cases are the ones that are potential game changers and can unearth new possibilities from AI that are unexplored and unimagined today As these technologies are relatively new it requires more time to establish its weaknesses strengths and nuances In this paper we look at key H3 AI algorithmic trends and how we leverage these in various use cases built as part of our IP Infosys Enterprise Cognitive platform (iECP)

Horizon 1(Mainstream)

Horizon 2(Adopt Scale)

Horizon 3(Envision Invent Disrupt)

bull Explainable AIbull Generative Networksbull Fine Grained Classicationbull Capsule Networksbull Meta Learningbull Transfer Learning (Text)bull Single Shot Learningbull Reinforcement Learningbull Auto MLbull Neural Architecture Search

(NAS)

bull Convolution Neural Networks (CNN)

bull Long Term Short Term Memory (LSTM)

bull Recurrent Neural Networks (RNN)

bull Word2Vecbull GloVebull Transfer Learning (Vision)

bull Logistic Regressionbull Naive Bayesbull Random Forestbull Support Vector Machines

(SVM)bull Collaborative Filteringbull n-grams

Algorithms

Use Cases

bull Scene Captioningbull Scene Detectionbull Store Footfall Countsbull Specic Object Class

Detectionbull Sentence Completionbull Video Scene Predictionbull Auto Learningbull Fake Images Art Generationbull Music Generationbull Data Augmentation

bull Object Detectionbull Face Recognitionbull Product Brand Recognition

Classicationbull Speech Recognitionbull Sentence Completionbull Speech Transcriptionsbull Topic Classicationbull Topic Extractionbull Intent Miningbull Question Extraction

bull Recommendationsbull Predictionbull Document Image

Classicationbull Document Image Clusteringbull Sentiment Analysisbull Named Entity Recognition

(NER)bull Keyword Extractions

Table 10 H3 Algorithms and Usecases- Infosys Research

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

Neural Network algorithms are considered

to derive hidden patterns from data that

many other conventional best of breed

Machine Learning algorithms such as

Support Vector Machines Random Forrest

Naiumlve Bayes etc are unable to establish

However there is an increasing rate of

incorrect and unexplainable decisions

and results produced by the Neural

Network algorithms in activities such

as credit lending skilled job hiring and

facial recognition Given this scenario

AI results should be justified explained

and reproduced for consistency and

correctness as some of these results can

Network dissection helps in associating

these established units to concepts

They learn from labeled concepts during

supervised training stages and how and in

what magnitude these are influenced by

channel activations

Several frameworks are currently evolving

to improve the explainability of the

models Two known frameworks in this

space are LIME and SHAP

Explainable AI (XAI)

have profound impact on livelihood

Geoffrey Hinton (University of Toronto)

often called the godfather of deep

learning explains ldquoA deep-learning system

doesnrsquot have any explanatory power The

more powerful the deep-learning system

becomes the more opaque it can becomerdquo

It is to address the issues of transparency

in AI that Explainable AI was developed

Explainable AI (XAI) as a framework

increases the transparency of black-box

algorithms by providing explanations for

the predictions made and can accurately

explain a prediction at the individual level

LIME ( Local Interpretability Model

agnostic Explanations) It treats the

model as a blackbox and tries to create

another surrogate non-linear model where

explainablitiy is supported or feasible

such as SVM Random Forest or Logistic

Regression The surrogate non-linear

model is then used to evaluate different

components of the image by perturbing

the inputs and evaluating its impact

on the result Thereby deciding which

Here are a few approaches provided

through certain frameworks that can help

understand the traceability of results

Feature Visualization as depicted in figure

below helps in visualizing various layers in

a neural network They help establish that

lower layers are useful in learning features

such as edges and textures whereas

higher layer provides more of higher order

abstract concepts such as objects

parts of the image are most important in

arriving at results Since the original model

does not participate directly it is model

independent The challenge with this

approach is that even when the surrogate

model based explanations can be relevant

to the model it is used on it may not be

generalizable precisely or become one to

one mappable to the original model all the

time

External Document copy 2019 Infosys Limited

Edges Textures Patterns Parts Objects

Figure 20 Feature Visualisation Source Olah et al 2017 (CC-BY 40)

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

Generative AI will have a potentially

strong role in creative work be it writing

articles creating completely new images

from the existing set of trained models

improving image or video quality merging

images for artistic creations in creating

music or improving dataset through data

generation Generative AI as it matures in

near term will augment many jobs and will

potentially replace many in future

Generative Networks consists of two deep

neural networks a generative network

and a discriminative network They work

together to provide high-level simulation

of conceptual tasks

To train a Generative model we first

collect a large amount of data in some

Generative AI

Steps

Neural Style Transfer (NST)

1 Create set of noisy (perturbed) example

images by disabling certain features

(marking certain portions gray)

2 For each example get the probability

that tree frog is in the image as per

original model

3 Using these created data points train a

domain (eg think millions of images

sentences or sounds etc) and then

train the model to generate similar

data Generative network generates the

data to fool the Discriminative Network

while Discriminative Network learns by

identifying real vs fake data received from

the Generative Network

Generator trains with an objective function

on whether it can fool the discriminator

network whereas discriminator trains on

its ability to not be fooled and correctly

identify real vs fake Both network

learns through back propagation The

generator is typically a deconvolutional

neural network and the discriminator is a

convolutional neural network

Generative Networks can be of multiple

types depending on the objective they are

designed for example being

Figure 30 Explaining a Prediction with LIME Source Pol Ferrando Understanding how LIME explains predictions

Neural Style Transfer (NST) is one of the

Generative AI techniques in deep learning

As seen below it merges two images

namely a content image (C) and a style

image (S) to create a generated image

(G) The generated image G combines the

content of the image C with the style of

image S

simple linear model (Logistic regression

etc) and get the results

4 Superpixels with highest positive

weights becomes an explanation

SHAP (SHapley Additive exPlanations)

It uses a game theory based approach

to predict the outcome by using various

permutations and combinations of features

and their effect on the delta of the result

(predicted - actual) and then computing

average of the score for that feature to

explain the results For image use cases

it marks the dominating feature areas by

coloring the pixels in the image

SHAP produces relatively accurate results

and is more widely used in Explainable AI

as against LIME

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

Some of the other GAN variations that are

popular are

Super Resolution GAN (SRGAN) that

helps improve quality of images

Stack-GAN that generates realistic

looking photographs from textual

descriptions of simple objects like birds

and flowers

Sketch-GAN a Generative model for

vector drawings which is a Recurrent

Neural Network (RNN) and is able to

construct stroke-based drawings of

common objects The model is trained

on a dataset of human-drawn images

representing many different classes

eGANs (Evolutionary Generative

Adversarial Networks) that generate

photographs of faces with different

ages from young to old

IcGAN to reconstruct photographs

of faces with specific features such

as changes in hair color style facial

expression and even gender

Content image

Content image

Content image

Colorful circle

Louvre museum

Ancient city of Persepolis

Blue painting

Impressionist style painting

Colorful circle with blue painting style

Louvre painting with impressionist style

Persepolis in Van Gogh style

The Starry Night (Van Gogh)

Style image

Style image

Style image

Generated image

Generated image

Generated image

External Document copy 2019 Infosys Limited

Figure 40 Novel Artistic Images through Neural Style Transfer Source Fisseha Berhane Deep Learning amp Art Neural Style Transfer

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

Classification of an object into specific

categories such as car table flower and

such are common in Computer Vision

However establishing the objectsrsquo finer

class based on specific characteristics is

where AI is making rapid progress This is

because granular features of objects are

being trained and used for differentiation

of objects

Examples of Fine Grained Classification are

In Fine Grained Classification the

progression through the 8-layer CNN

network can be thought of as a progression

from low to mid to high-level features

The later layers aggregate more complex

structural information across larger

scalesndashsequences of convolutional layers

Fine Grained Classification Fine grained clothing style finder type

of a shoe etc

Recognizing a car type

Recognizing breed of a dog plant

species insect bird species etc

However fine-grained classification is

challenging due to the difficulty of finding

discriminative features Finding those

subtle traits that fully characterize the

object is not straightforward

Feature representations that better

preserve fine-grained information

Segmentation-based approaches that

facilitate extraction of purer features

and partpose normalized feature

spaces

Pose Normalization Schemes

Fine Grained Classification Approaches

interleaved with max-pooling can capture

deformable parts and fully connected

layers can capture complex co-occurrence

statistics

Bird recognition is one of the major examples in fine grained classification in the below image given a test image

groups of detected key points are used to compute multiple warped image regions that are aligned with prototypical models Each region is fed through a deep convolutional network and features are extracted from multiple layers after which they are concatenated and fed to a classifier

Figure 50 Bird Recognition Pipeline Overview Source Branson Van Hoen et al Bird Species Categorization

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

The below pictures and steps depict fine

grained classification approach for car

detection system

a Detects parts using a collection of

unsupervised part detectors

b Outputs a grid of discriminative

features (The CNN is learned with class

Car Detection System using Fine Grained Classification

labels and then truncated retaining

the first two convolutional layers

that retain spatial information) The

appearance of each part detected

using the learned CNN features is

described by pooling in the detected

region of each part

c Appearance of any undetected part

is set to zero This results in Ensemble

of Localized Learned Features (ELLF)

representation which is then used to

predict fine-grained object categories

d A standard CNN passes the output

of the convolutional layers through

several fully connected layers in order

to make a prediction

Convolutional Network are so far the

defacto and well accepted algorithms to

work with image based datasets They

work on the pixels of images using various

size filters (channels) by convolving using

pooling techniques to bubble the stronger

features to derive colors textures edges

and shapes and establish structures

through lower to highest layers

Given the face of a person CNN identifies

the face by establishing eyes ears

eyebrows lips chin etc components

of the face However if the facial image

is provided with incorrect position and

Capsule Network

alignment of eyes and eyebrows or say

eyebrows swaps with lips and ears are

placed on forehead the same CNN trained

algorithm would still go on and detect this

as a human face This is the huge drawback

of CNN algorithm and happens due to

its inability to store the information on

relative position of various objects

Capsule Network invented by Geoffery

Hinton addresses exactly this problem of

CNN by storing the spatial relationships of

various parts

Capsule Network like CNN are multi

layered neural networks consisting of

several capsules each capsule consists

of several neurons Capsules in lower

layers are called primary capsules and are

trained to detect an object (eg triangle

circle) within a given region of image It

outputs a vector that has two properties

Length and Orientation Length represents

the probability of the presence of the

object and Orientation represents the

pose parameters of the object such as

coordinates rotation angle etc

Capsules in higher layers called routing

capsules detect larger and more complex

objects such as eyes ears etc

Figure 60 Car Detection System Source Learning Features and Parts for Fine-Grained Recognition

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

Routing by Agreement

Advantage over CNN

Unlike CNN which primarily bubbles higher

order features using max or avg pooling

Capsule Network bubbles up features

using routing by agreement where every

capsule participates in choosing the shape

by voting (democratic election way)

In the figure given above

Lower level corresponds to rectangles

triangles and circles

High level corresponds to houses

boats and cars

If there is an image of a house the

capsules corresponding to rectangles

and triangles will have large activation

vectors Their relative positions (coded

in their instantiation parameters) will bet

on the presence of high-level objects

Since they will agree on the presence of

house the output vector of the house

capsule will become large This in turn

will make the predictions by the rectangle

and the triangle capsules larger This

cycle will repeat 4-5 times after which the

bets on the presence of a house will be

considerably larger than the bets on the

presence of a boat or a car

Less data for training - Capsule

Networks need very less data for

training (almost 10) as compared to

CNN

Fewer parameters The connections

between layers require fewer

parameters as capsule groups neurons

resulting in relatively less computations

bandwidth

Preserve pose and position - They

preserve pose and position information

as against CNN

High accuracy - Capsule Networks

have higher accuracy as compared to

CNNs

Reconstruction vs mere classification

- CNN helps you to classify the images

but not reconstruct the same image

whereas Capsule Networks help you to

reconstruct the exact image

Information retention vs loss - With

CNN during edge detection kernel

for edge detection works only on a

specific angle and each angle requires

a corresponding kernel When dealing

with edges CNN works well because

there are very few ways to describe

an edge Once we get up to the level

of shapes we do not want to have a

kernel for every angle of rectangles

ovals triangles and so on It would

get unwieldy and would become

even worse when dealing with more

complicated shapes that have 3

dimensional rotations and features like

lighting the reason why traditional

neural nets do not handle unseen

rotations effectively

Capsule Networks are best suited for

object detection and image segmentation

while it helps better model hierarchical

relationships and provides high accuracy

However Capsule Networks are still under

research and relatively new and mostly

tested and benchmarked on MNIST

dataset but they will be the future in

working with massive use cases emerging

from Vision datasets

Figure 70 A simple CapsNet with 3 layers This model gives comparable results to deep convolutional networks Source Dynamic

Routing Between Capsules Sara Sabour Nicholas Frosst Geoffrey E Hinton

Figure 80 Capsule Network for House or Boat classification Source Beginnersrsquo Guide to Capsule Networks

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

Traditional methods of learning in Machine

Learning focuses on taking a huge labeled

dataset and then learning to detect y

(dependent variable say classifying an

image as cat or dog) and given set of x

(independent variables images of cats

and dogs) This process involves selection

of an algorithm such as Convolution

Neural Net and arriving at various hyper

parameters such as number of layers

in the network number of neurons in

each layer learning rate weights bias

dropouts activation function to activate

the neuron such as sigmoid tanh and

Relu The learning happens through

several iterations of forward and backward

passes (propagation) by readjusting (also

called learning) the weights based on

difference in the loss (actual vs computed)

At the minimal loss the weights and

other network parameters are frozen

and are considered final model for future

prediction tasks This is obviously a long

and tedious process and repeating this for

every use case or task is engineering data

and compute intensive

Meta Learning focuses on how to learn to

learn It is one of the fascinating discipline

of artificial intelligence Human beings

have varying styles of learning Some

Humans can learn from their own existing

experiences or experiences they have

heard seen or observed Transfer Learning

discipline of AI is based on similar traits of

human learning where new models can

learn and benefit from existing trained

model

For example if a Computer Vision based

detection model with no Transfer

Learning that already detects various

types of vehicles such as cars trucks and

bicycles needs to be trained to detect an

airplane then you may have to retrain the

full model with images of all the previous

objects

Like the variety in human learning

techniques Meta Learning also uses

various learning methods based on

patterns of problems such as those based

on boundary space amount of data by

optimizing size of neural network or using

recurrent network approach Each of these

are briefly discussed inline

Few Shots Meta-Learning

This learning technique focuses on

learning from a few instances of data

Typically Neural Nets need millions of

data points to learn however Few Shots

Meta- Learning uses only a few instances

of data to build models Examples being

Facial recognition systems using Single

Shot Learning this is explained in detail in

Single Shot Learning section

Optimizer Meta-Learning

In this method the emphasis is on

optimizing the neural network and its

hyper- parameters A great example of

optimizer meta-learning are models that

are focused on improving gradient descent

techniques

Metric Meta-Learning

In this learning method the metric space

is narrowed down to improve the focus of

learning Then the learning is carried out

only in this metric space by leveraging

various optimization parameters that are

established for the given metric space

Recurrent Model Meta-Learning

This type of meta-learning model is tailored

to Recurrent Neural Networks(RNNs)

such as Long-Short-Term-Memory(LSTM)

In this architecture the meta-learner

algorithm will train a RNN model to process

a dataset sequentially and then process

new inputs from the task In an image

classification setting this might involve

passing in the set of (image label) pairs

of a dataset sequentially followed by new

examples which must be classified Meta-

Reinforcement Learning is an example of

this approach

Meta Learning

Transfer Learning (TL)

Types of Meta-Learning Models

people learn and memorize with one

instance of visual or auditory scan Some

people need multiple perspectives to

strengthen the neural connections for

permanent memory Some remember by

writing while some remember through

actual experiences Meta Learning tries

to leverage these to build its learning

characteristics

However with Transfer Learning you can

introduce an additional layer on top of the

existing pre-trained layer to start detecting

airplanes

Typically in a no Transfer Learning

scenario model needs to be trained and

during training right weights are arrived

at by doing many iterations (epochs) of

forward and back propagation which takes

significant amount of computation power

and time In addition Vision models need

significant amount of image data such as

in this example images of airplanes to be

trained

With Transfer Learning approach you can

reuse the existing pre-trained weights of an

existing trained model with significantly

less number of images ( 5 to 10 percent

of actual images needed for training

ground up model) for the model to start

detecting As the pre-trained model has

already learnt some basic learning around

identifying edges curves and shapes in the

earlier layers it needs to learn only higher

order features specific to airplanes with

the existing computed weights In brief

Transfer Learning helps eliminate the need

to learn anything from scratch

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

Transfer Learning helps in saving

significant amount of data computational

power and time in training new models as

they leverage pre-trained weights from the

existing trained models and architectures

However it is important to understand

that Transfer Learning approach today

is only matured enough to be applied to

similar use cases that is you cannot use

the above discussed model to train a facial

recognition model

Another key thing during Transfer Learning

is that it is important to understand the

details of the data on which new use cases

are being trained as it can implicitly push

the built-in biases from the underlying data

into newer systems It is recommended

that the datasheets of underlying models

and data be studied thoroughly unless the

usage is for experimentative purpose

Earlier having used the human brain

rationale it is important to note that

human brains have gone through centuries

of experiences and gene evolution and has

the ability to learn faster whereas transfer

learning is just a few decades old and is

becoming ground for new vision and text

use cases

External Document copy 2019 Infosys Limited

Figure 90 Transfer Learning Layers Source John Cherrie Training Deep Learning Models with Transfer Learning

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

Humans have the impressive skill to reason

about new concepts and experiences with

just a single example They have the ability

for one-shot generalization the aptitude

to encounter a new concept understand

its structure and then generate compelling

alternative variations of the same

Facial recognition systems are good

candidates for Single Shot Learning

otherwise needing ten thousands of

individual face images to train one neural

network can be extremely costly time

consuming and infeasible However a

Single Shot LearningSingle Shot Learning based system using

existing pre-trained FaceNet model and

facial encoding based approach on top of

it can be very effective to establish face

similarity by computing distance between

the faces

In this approach 128 bit encoding of each

face image is generated and compared

with other imagersquos encoding to determine

if the person is same or different

Various distance based algorithms such

as Euclidean distance can be used to

determine if they are within specified

threshold The model training approach

involves creating pairs of (Anchor Positive)

and (Anchor Negative) and training the

model in a way where (Anchor Positive)

pair distance difference is smaller and

(Anchor Negative) distance is farther

ldquoAnchorrdquo is the image of a person for whom

the recognition model needs to be trained

ldquoPositiverdquo is another image of the same

person

ldquoNegativerdquo is image of a different person

External Document copy 2019 Infosys Limited

Figure 100 Encoding approach inspired from ML Course from Coursera

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

This is a specialized Machine Learning

discipline where an agent learns to behave

in an environment by getting reward or

punishment for the actions performed The

agent can have an objective to maximize

short term or long-term rewards This

discipline uses deep learning techniques to

Value Based

In value-based RL the goal is to optimize

the value function V(s) Qtable uses any

The agent will use this value function to

select which state to choose at each step

Policy Based

In policy-based RL we want to directly

optimize the policy function π(s) without

using a value function

The policy is what defines the agent

behavior at a given time

Deep Reinforcement Learning (RL)

Three Approaches to Reinforcement Learning

bring in human level performance on the

given task

Deep Reinforcement Learning has found

significant relevance and application in

various game design systems such as

creating video games chess alpha Go

Atari as well as in industrial applications of

mathematical function to arrive at a state

based on action

The value of each state is the total amount

There are two types of policies

1 Deterministic A policy which at a given

state will always return the same action

2 Stochastic A policy that outputs a

distribution probability over actions

Value based and Policy based are more

conventional Reinforcement Learning

approaches They are useful for modeling

relatively simple systems

robots driverless car etc

In reinforcement learning policy p

controls what action we should take Value

function v measures how good it is to be

in a particular state The value function

tells us the maximum expected future

reward the agent will get at each state

of the reward an agent can expect to

accumulate over the future starting at that

state

State

State

Q value

Q value action 2

Q value action 1

Q value action 3

Qtable

Deep Q Neuralnetwork

Q learning

Deep Q learning

Action

ExpectedReward discounted

Given that state

action = policy(state)

Figure 110 Schema inspired by the Q learning notebook by Udacity

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

Model Based

In model-based RL we model the

environment This means we create a

model of the behavior of the environment

then this model is used to arrive at results

that maximises short term or long-term

rewards The model equation can be

any equation that is defined based on

the environments behavior and must be

sufficiently generalized to counter new

situations

When Model based approach uses Deep

Neural Network algorithms to sufficiently

well generalize and learn the complexities

of the environment to produce optimal

results it is called Deep Reinforcement

Learning The challenge with model based

approach is each environment needs a

dedicated trained model

AlphaGo was trained by using data from

several games to beat the human being

in the game of Go The training accuracy

was just 57 and still it was sufficient to

beat the human level performance The

training methods involved reinforcement

learning and deep learning to build a

policy network that tells what moves are

promising and a value network that tells

how good the board position is Searches

for the final move from these networks

is done using Monte Carlo Tree Search

(MCTS) algorithm Using supervised

learning a policy network was created to

imitate the expert moves

Deep Mind released AlphaGo Zero in late

2017 which beat AlphaGo and did not

involve any training from previous games

data to train deep network The deep

network training was done by picking

the training samples from AlphaGo and

AlphaGo Zero playing games against

itself and selecting best moves to train

the network and then applying those

in real games to improve the results

iteratively This is possible because deep

reinforcement learning algorithms can

store long-range tree search results for the

next best move in memory and do very

large computations that are difficult for a

human brain

Designing machine learning solution

involves several steps such as collecting

data understanding cleansing and

normalizing data doing feature

engineering selecting or designing

the algorithm selecting the model

architecture selecting and tuning modelrsquos

hyper-parameters evaluating modelrsquos

performance deploying and monitoring

the machine learning system in an online

system and so on Such machine learning

solution design requires an expert Data

Scientist to complete the pipeline

Auto ML (AML)As the complexity of these and other tasks

can easily get overwhelming the rapid

growth of machine learning applications

has created a demand for off-the-shelf

machine learning methods that can be

used easily and without expert knowledge

The AI research area that encompasses

progressive automation of machine

learning pipeline tasks is called AutoML

(Automatic Machine Learning)

Google CEO Sundar Pichai wrote

ldquoDesigning neural nets is extremely time

intensive and requires an expertise that

limits its use to a smaller community of

scientists and engineers Thatrsquos why wersquove

created an approach called AutoML

showing that itrsquos possible for neural nets

to design neural netsrdquo while Googlersquos

Head of AI Jeff Dean suggested that 100x

computational power could replace the

need for machine learning expertise

AutoML Vision relies on two core

techniques transfer learning and neural

architecture search

Xtrain Ytrain

Xtest budget

Han

d-cr

afte

d po

rtfo

lio Meta Learning

AutoML system

Build Ensemble Ytest Data

ProcessorFeature

Preprocessor

Bayesian Optimization

Classier

ML Pipeline

Figure 120 An example of Auto sklearn pipeline Source Andreacute Biedenkapp We did it Again World Champions in AutoML

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

Here is a look at the few libraries that help

in implementing AutoML

AUTO-SKLEARN

AUTO SKLEARN automates several key

tasks in Machine Learning pipeline such

as addressing column missing values

encoding of categorical values data scaling

and normalization feature pre-processing

and selection of right algorithm with

hyper-parameters The pipeline supports

15 Classification and 14 Feature processing

algorithms Selection of right algorithm can

happen based on ensembling techniques

and applying meta knowledge gathered

from executing similar scenarios (datasets

and algorithms)

Usage

Auto-sklearn is written in python and can

be considered as replacement for scikit-

learn classifiers Here is a sample set of

commands

gtgtgt import autosklearnclassification

Implementing AutoML gtgtgt cls = autosklearnclassification

AutoSklearnClassifier()

gtgtgt clsfit(X_train y_train)

gtgtgt predictions = clspredict(X_test

y_test)

SMAC (Sequential Model-Based

Algorithm Configuration)

SMAC is a tool for automating certain

AutoML steps SMAC is useful for selection

of key features hyper-parameter

optimization and to speed up algorithmic

outputs

BOHB (Bayesian Optimization

Hyperband searches)

BOHB combines Bayesian hyper parameter

optimization with bandit methods for

faster convergence

Google H2O also have their respective

AutoML tools which are not covered here

but can be explored in specific cases

AutoML needs significant memory and

computational power to execute alternate

algorithms and compute results At

present GPU resources are extremely

costly to execute even simple Machine

Learning workloads such as CNN algorithm

to classify objects If multiple such

alternate algorithms should be executed

the computation dollar needed would be

exponential This is impractical infeasible

and inefficient for the current state of Data

Science industry Adoption of AutoML will

depend on two things one the maturity

of AutoML pipeline and second but more

important how quickly GPU clusters

become cheap The second being most

critical Selling Cloud GPU capacity could

be one of the motivation of several cloud

based infrastructure-running companies

to promote AutoML in the industry Also

AutoML will not replace the Data scientistrsquos

work but can provide augmentation

and speed to certain tasks such as data

standardization model tuning and

trying multiple algorithms It is only the

beginning for AutoML but this technique

has high relevance and usefulness for

solving ultra-complex problems

External Document copy 2019 Infosys Limited

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

Neural Architecture Search (NAS) is a

component of AutoML and addresses

the important step of designing Neural

Network Architecture

Designing fresh Neural Net architecture

involves an expert establishing and

organizing Neural Network layers filters

or channels filter sizes selecting other

optimum Hyper parameters and so on

through several rounds of computational

Neural Architecture Search (NAS)

iterations Since AlexNet deep neural

network architecture won the ImageNet

(image classification based on ImageNet

dataset) competition in 2012 several

architecture styles such as VGG ResNet

Inception Xception InceptionResNet

MobileNet and NASNet have significantly

evolved However selection of the right

architecture for the right problem is also

a skill due to the presence of various

influencers such as applicability to the

problem accuracy number of parameters

memory and computational footprint and

size of the architecture that govern the

overall functioning efficiency

Neural Architecture Search tries to address

this problem space by automatically

selecting right Neural Network architecture

to solve a given problem

AutoML

HyperparameterOptimization

NAS

External Document copy 2019 Infosys Limited

Figure 130 Source Liam Li Ameet Talwalkar What is neural architecture search

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

Search space The search space provides

boundary within which the specific

architecture needs to be searched

Computer Vision (captioning the scene

or product identification) based use

cases would need a different neural

network architecture style as against

Speech (speech transcription or speaker

classification) or unstructured Text (Topic

extraction intent mining) based use cases

Search space tries to provide available

catalogs of best in class architectures based

on other domain data and performance

Key Components of NAS

These are also usually hand crafted by

expert data scientists

Optimization method This is responsible

for providing mechanism to search the

best architecture It could be searched

and applied randomly or using certain

statistical or Machine Learning evaluation

approach such as Bayesian method or

reinforcement learning methods

Evaluation method This has the role

of evaluating the quality of architecture

considered by optimization method It

could be done using full training approach

or doing partial training and then applying

certain specialized methods such as partial

training or early stopping weights sharing

network morphism etc

For selective problem spaces as

compared to manual methods NAS have

outperformed and is showing definite

promise for future However it is still

evolving and not ready for production

usages as several architectures need to be

established and evaluated depending on

the problem space

Search Space

DAG Representation

Cell Block

Meta-Architecture

NAS Specic

Reinforcement Learning

Evolutionary Search

Gradient-Based Optimization

BayesianOptimization

Optimization Method

Components of NAS

Full Training

Partial Training

Weight-Sharing

Network Morphism

Hypernetworks

Evaluation Method

Figure 140 Components of NAS Source Liam Li Ameet Talwalkar What is neural architecture search

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

In this paper we looked at some key H3 AI

areas by no means this is an exhaustive

list Amongst all discussed Transfer

Learning Capsule Networks Explainable

AI Generative AI are making interesting

Addressing H3 AI Trends at Infosys

things possible and looks highly promising

We are keenly experimenting with these

building early use cases and integrating

into our product stack Infosys Enterprise

Cognitive platform (iECP) to solve

interesting client problems Here is a look

at how we are employing these H3 trends

in the work we do

Trend Use cases

1

2

3

4

5

6

7

8

9

10

Explainable AI (XAI)

Generative AI Neural Style Transfer (NST)

Fine Grained Classication

Capsule Networks

Meta Learning

Transfer Learning

Single Shot Learning

Deep Reinforcement Learning (RL)

Auto ML

Neural Architecture Search (NAS)

Applicable across where results need to be traced eg Tumor Detection Mortgage Rejection Candidate Selection etc

Art Generation Sketch Generation Image or Video Resolution Improvements Data GenerationAugmentation Music Generation

Vehicle Classication Type of Tumor Detection

Image Re-constructionImage ComparisonMatching

Intelligent Agents Continuous Learning scenarios for document review and corrections

Identifying person not wearing helmet Logobrand detection in the image Speech Model training for various accents vocabularies

Face Recognition Face Verication

Intelligent Agents Robots Driverless cars Trac Light Monitoring Continuous Learning scenarios for document review and corrections

Invoice Attribute Extraction Document Classication Document Clustering

CNN or RNN based use cases such as Image Classication Object Identication Image Segmentation Speaker Classication etc

External Document copy 2019 Infosys Limited

Table 20 AI Use cases Infosys Research

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

1 Explainable AI (XAI)

2 Fine Grained Classification

5 Transfer Learning

6 Single Shot Learning

3 Capsule Networks

4 Meta Learning

7 Deep Reinforcement Learning (RL)

8 Auto ML

bull httpschristophmgithubiointerpretable-ml-book

bull httpssimmachinescomexplainable-ai

bull httpswwwcmuedunewsstoriesarchives2018octoberexplainable-aihtml

bull httpsmediumcomQuantumBlackmaking-ai-human-again-the-importance-of-explainable-ai-xai-95d347ccbb1c

bull httpstowardsdatasciencecomexplainable-artificial-intelligence-part-2-model-interpretation-strategies-75d4afa6b739

bull httpsvisioncornelleduse3wp-contentuploads201502BMVC14pdf

bull httpswwwfastai20180723auto-ml-3

bull httpsarxivorgpdf160305106pdf

bull httpsarxivorgpdf171009829pdf

bull httpskerasioexamplescifar10_cnn_capsule

bull httpswwwyoutubecomwatchv=pPN8d0E3900

bull httpswwwyoutubecomwatchv=rTawFwUvnLE

bull httpsmediumfreecodecamporgunderstanding-capsule-networks-ais-alluring-new-architecture-bdb228173ddc

bull httpsmediumcomjrodthoughtswhats-new-in-deep-learning-research-openai-s-reptile-makes-it-easier-to-learn-how-to-learn-e0f6651a39f0

bull httpproceedingsmlrpressv48santoro16pdf

bull httpstowardsdatasciencecomwhats-new-in-deep-learning-research-understanding-meta-learning-91fef1295660

bull httpsdeepmindcomblogarticledeep-reinforcement-learning

bull httpsmediumfreecodecamporgan-introduction-to-reinforcement-learning-4339519de419

bull httpsmediumcomjonathan_huialphago-zero-a-game-changer-14ef6e45eba5

bull httpsarxivorgpdf181112560pdf

bull httpswwwml4aadorgautomated-algorithm-designalgorithm-configurationsmac

bull httpswwwfastai20180723auto-ml-3

bull httpswwwfastai20180716auto-ml2auto-ml

bull httpscompetitionscodalaborgcompetitions17767

bull httpswwwautomlorgautomlauto-sklearn

bull httpswwwml4aadorgautomated-algorithm-designalgorithm-configurationsmac

bull httpsautomlgithubioHpBandSterbuildhtmloptimizersbohbhtml

Reference

copy 2019 Infosys Limited Bengaluru India All Rights Reserved Infosys believes the information in this document is accurate as of its publication date such information is subject to change without notice Infosys acknowledges the proprietary rights of other companies to the trademarks product names and such other intellectual property rights mentioned in this document Except as expressly permitted neither this documentation nor any part of it may be reproduced stored in a retrieval system or transmitted in any form or by any means electronic mechanical printing photocopying recording or otherwise without the prior permission of Infosys Limited and or any named intellectual property rights holders under this document

For more information contact askusinfosyscom

Infosyscom | NYSE INFY Stay Connected

9 Neural Architecture Search (NAS)

10 Infosys Enterprise Cognitive Platform

bull httpswwworeillycomideaswhat-is-neural-architecture-search

bull httpswwwinfosyscomservicesincubating-emerging-technologiesofferingsPagesenterprise-cognitive-platformaspx

Sudhanshu Hate is inventor and architect of Infosys Enterprise Cognitive Platform (iECP)

a microservices API based Artificial Intelligence platform He has over 21 years of experience

in creating products solutions and working with clients on industry problems His current

areas of interests are Computer Vision Speech and Unstructured Text based AI possibilities

To know more about our work on the H3 trends in AI write to icetsinfosyscom

About the author

Page 2: H3 Trends in AI Algorithms: The Infosys Way...• Sentiment Analysis Figure 1.0: ... from use cases such as language translations, sentence formulation, text summarization, topic extraction

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

Today technology adoption is influenced

by business and technology uncertainties

These uncertainties drive organisations

to evaluate technology adoptions based

on risks and returns Broadly technology

led disruptions can be classified into

Horizon1 Horizon 2 and Horizon 3 Horizon

1 or H1 technologies are those that are

in mainstream client adoptions and have

steady business transactions while H2

and H3 are those that are yet to become

mainstream but have started to spring

interesting possibilities and potential

returns in the future

At Infosys Center for Emerging

Technology solutions (iCETS) we

continuously look at H2 and H3

technologies and their impact on client

landscapes These H2 and H3 technologies

are very important to be monitored as they

have the potential to transform or disrupt

existing well-oiled business models hence

fetching large returns However there

are also associated risks from adoptions

that need to be monitored as some of

those can have higher negative impact on

compliance safety and so on

With the emergence and availability of

several open datasets computational

thrust with GPU availability and maturity

of Artificial Intelligence (AI) algorithms AI

is making strong inroads into current and

future of IT ecosystems Today AI plays

an integral role in IT strategy by driving

new experiences and creating new art of

possibilities In this paper we try to look

at important AI algorithms that are

shaping various H3 of AI possibilities

While we do that here is a chart

representing the broader AI algorithm

landscape in the context of this paper

Business Uncertainty

Tech

nolo

gy U

ncer

tain

ty

Fly

Run

Crawl a

nd Wal

k

Emerg

ing Oerings

Core

O

erings

New

O

erings

DierentiateDiversify Deploy

AdoptScale

Enhance

EnvisionInvent

Disrupt

Emerging InvestmentOpportunities

Algorithms Use Cases

Incubated to New Oerings

Main stream

bull Explainable AIbull Generative Networksbull Fine Grained Classicationbull Capsule Networksbull Meta Learningbull Transfer Learning (Text)bull Single Shot Learningbull Reinforcement Learningbull Auto MLbull Neural Architecture Search (NAS)

bull Scene Captioningbull Scene detectionbull Store Footfall countsbull Specic object class detectionbull Sentence Completionbull Video Scene predictionbull Auto learningbull Fake images Art generationbull Music generationbull Data Augmentation

bull Convolution Neural Networks (CNN)bull Long Term Short Term Memory

(LSTM)bull Recurrent Neural Networks (RNN)bull Word2Vecbull GloVebull Transfer Learning (Vision)

bull Object detectionbull Product brand recognition

classicationbull Facial Recognitionbull Speech Recognitionbull Speech Transcriptionsbull Topic Classication extraction

bull Logistic Regressionbull Naive Bayesbull Random Forestbull Support Vector Machines (SVM)bull Collaborative Filteringbull n-grams

bull Recommendationsbull Predictionbull Document image

classicationbull Document image Clusteringbull Sentiment Analysis

External Document copy 2019 Infosys Limited

Figure 10 Horizon 3 AI Algorithms- Infosys Research

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

H1 of AI ldquocore offeringsrdquo are typically defined as algorithm powered use cases that have become mainstream and will remain major investment areas for the current wave In that respect adoption of use cases such as product or customer recommendations churn and sentiment analysis and leveraging algorithms such as Random Forest Support Vector Machines (SVM) Naiumlve Bayes and n-grams based approaches have been mainstream for some time and will continue to get weaved into varied AI experiences

H2 of AI ldquonew offeringsrdquo use cases are the ones that are currently in experimentative evolutionary mode and will have major impact on Artificial Intelligent systems

that will be mainstream in second wave Convolution Neural Networks (CNN) have laid foundation for several art of possible Computer Vision use cases ranging from object detection image captioning and segmentation to facial recognition Long Term Short Term Memory(LSTM) and Recurrent Neural Nets (RNN) are helping to significantly improve art of possibilities from use cases such as language translations sentence formulation text summarization topic extraction and so on Word vectors based models such as GloVe and Word2Vec are helping in dealing with large multi dimensional text corpuses and finding hidden unspotted complex interwoven relationships and similarity between topics entities and keywords

These H2 AI algorithms are promising interesting new possibilities in various business functions however it is still in nascent stage of adoption and user testing

H3 of AI ldquoemerging offeringsrdquo use cases are the ones that are potential game changers and can unearth new possibilities from AI that are unexplored and unimagined today As these technologies are relatively new it requires more time to establish its weaknesses strengths and nuances In this paper we look at key H3 AI algorithmic trends and how we leverage these in various use cases built as part of our IP Infosys Enterprise Cognitive platform (iECP)

Horizon 1(Mainstream)

Horizon 2(Adopt Scale)

Horizon 3(Envision Invent Disrupt)

bull Explainable AIbull Generative Networksbull Fine Grained Classicationbull Capsule Networksbull Meta Learningbull Transfer Learning (Text)bull Single Shot Learningbull Reinforcement Learningbull Auto MLbull Neural Architecture Search

(NAS)

bull Convolution Neural Networks (CNN)

bull Long Term Short Term Memory (LSTM)

bull Recurrent Neural Networks (RNN)

bull Word2Vecbull GloVebull Transfer Learning (Vision)

bull Logistic Regressionbull Naive Bayesbull Random Forestbull Support Vector Machines

(SVM)bull Collaborative Filteringbull n-grams

Algorithms

Use Cases

bull Scene Captioningbull Scene Detectionbull Store Footfall Countsbull Specic Object Class

Detectionbull Sentence Completionbull Video Scene Predictionbull Auto Learningbull Fake Images Art Generationbull Music Generationbull Data Augmentation

bull Object Detectionbull Face Recognitionbull Product Brand Recognition

Classicationbull Speech Recognitionbull Sentence Completionbull Speech Transcriptionsbull Topic Classicationbull Topic Extractionbull Intent Miningbull Question Extraction

bull Recommendationsbull Predictionbull Document Image

Classicationbull Document Image Clusteringbull Sentiment Analysisbull Named Entity Recognition

(NER)bull Keyword Extractions

Table 10 H3 Algorithms and Usecases- Infosys Research

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

Neural Network algorithms are considered

to derive hidden patterns from data that

many other conventional best of breed

Machine Learning algorithms such as

Support Vector Machines Random Forrest

Naiumlve Bayes etc are unable to establish

However there is an increasing rate of

incorrect and unexplainable decisions

and results produced by the Neural

Network algorithms in activities such

as credit lending skilled job hiring and

facial recognition Given this scenario

AI results should be justified explained

and reproduced for consistency and

correctness as some of these results can

Network dissection helps in associating

these established units to concepts

They learn from labeled concepts during

supervised training stages and how and in

what magnitude these are influenced by

channel activations

Several frameworks are currently evolving

to improve the explainability of the

models Two known frameworks in this

space are LIME and SHAP

Explainable AI (XAI)

have profound impact on livelihood

Geoffrey Hinton (University of Toronto)

often called the godfather of deep

learning explains ldquoA deep-learning system

doesnrsquot have any explanatory power The

more powerful the deep-learning system

becomes the more opaque it can becomerdquo

It is to address the issues of transparency

in AI that Explainable AI was developed

Explainable AI (XAI) as a framework

increases the transparency of black-box

algorithms by providing explanations for

the predictions made and can accurately

explain a prediction at the individual level

LIME ( Local Interpretability Model

agnostic Explanations) It treats the

model as a blackbox and tries to create

another surrogate non-linear model where

explainablitiy is supported or feasible

such as SVM Random Forest or Logistic

Regression The surrogate non-linear

model is then used to evaluate different

components of the image by perturbing

the inputs and evaluating its impact

on the result Thereby deciding which

Here are a few approaches provided

through certain frameworks that can help

understand the traceability of results

Feature Visualization as depicted in figure

below helps in visualizing various layers in

a neural network They help establish that

lower layers are useful in learning features

such as edges and textures whereas

higher layer provides more of higher order

abstract concepts such as objects

parts of the image are most important in

arriving at results Since the original model

does not participate directly it is model

independent The challenge with this

approach is that even when the surrogate

model based explanations can be relevant

to the model it is used on it may not be

generalizable precisely or become one to

one mappable to the original model all the

time

External Document copy 2019 Infosys Limited

Edges Textures Patterns Parts Objects

Figure 20 Feature Visualisation Source Olah et al 2017 (CC-BY 40)

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

Generative AI will have a potentially

strong role in creative work be it writing

articles creating completely new images

from the existing set of trained models

improving image or video quality merging

images for artistic creations in creating

music or improving dataset through data

generation Generative AI as it matures in

near term will augment many jobs and will

potentially replace many in future

Generative Networks consists of two deep

neural networks a generative network

and a discriminative network They work

together to provide high-level simulation

of conceptual tasks

To train a Generative model we first

collect a large amount of data in some

Generative AI

Steps

Neural Style Transfer (NST)

1 Create set of noisy (perturbed) example

images by disabling certain features

(marking certain portions gray)

2 For each example get the probability

that tree frog is in the image as per

original model

3 Using these created data points train a

domain (eg think millions of images

sentences or sounds etc) and then

train the model to generate similar

data Generative network generates the

data to fool the Discriminative Network

while Discriminative Network learns by

identifying real vs fake data received from

the Generative Network

Generator trains with an objective function

on whether it can fool the discriminator

network whereas discriminator trains on

its ability to not be fooled and correctly

identify real vs fake Both network

learns through back propagation The

generator is typically a deconvolutional

neural network and the discriminator is a

convolutional neural network

Generative Networks can be of multiple

types depending on the objective they are

designed for example being

Figure 30 Explaining a Prediction with LIME Source Pol Ferrando Understanding how LIME explains predictions

Neural Style Transfer (NST) is one of the

Generative AI techniques in deep learning

As seen below it merges two images

namely a content image (C) and a style

image (S) to create a generated image

(G) The generated image G combines the

content of the image C with the style of

image S

simple linear model (Logistic regression

etc) and get the results

4 Superpixels with highest positive

weights becomes an explanation

SHAP (SHapley Additive exPlanations)

It uses a game theory based approach

to predict the outcome by using various

permutations and combinations of features

and their effect on the delta of the result

(predicted - actual) and then computing

average of the score for that feature to

explain the results For image use cases

it marks the dominating feature areas by

coloring the pixels in the image

SHAP produces relatively accurate results

and is more widely used in Explainable AI

as against LIME

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

Some of the other GAN variations that are

popular are

Super Resolution GAN (SRGAN) that

helps improve quality of images

Stack-GAN that generates realistic

looking photographs from textual

descriptions of simple objects like birds

and flowers

Sketch-GAN a Generative model for

vector drawings which is a Recurrent

Neural Network (RNN) and is able to

construct stroke-based drawings of

common objects The model is trained

on a dataset of human-drawn images

representing many different classes

eGANs (Evolutionary Generative

Adversarial Networks) that generate

photographs of faces with different

ages from young to old

IcGAN to reconstruct photographs

of faces with specific features such

as changes in hair color style facial

expression and even gender

Content image

Content image

Content image

Colorful circle

Louvre museum

Ancient city of Persepolis

Blue painting

Impressionist style painting

Colorful circle with blue painting style

Louvre painting with impressionist style

Persepolis in Van Gogh style

The Starry Night (Van Gogh)

Style image

Style image

Style image

Generated image

Generated image

Generated image

External Document copy 2019 Infosys Limited

Figure 40 Novel Artistic Images through Neural Style Transfer Source Fisseha Berhane Deep Learning amp Art Neural Style Transfer

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

Classification of an object into specific

categories such as car table flower and

such are common in Computer Vision

However establishing the objectsrsquo finer

class based on specific characteristics is

where AI is making rapid progress This is

because granular features of objects are

being trained and used for differentiation

of objects

Examples of Fine Grained Classification are

In Fine Grained Classification the

progression through the 8-layer CNN

network can be thought of as a progression

from low to mid to high-level features

The later layers aggregate more complex

structural information across larger

scalesndashsequences of convolutional layers

Fine Grained Classification Fine grained clothing style finder type

of a shoe etc

Recognizing a car type

Recognizing breed of a dog plant

species insect bird species etc

However fine-grained classification is

challenging due to the difficulty of finding

discriminative features Finding those

subtle traits that fully characterize the

object is not straightforward

Feature representations that better

preserve fine-grained information

Segmentation-based approaches that

facilitate extraction of purer features

and partpose normalized feature

spaces

Pose Normalization Schemes

Fine Grained Classification Approaches

interleaved with max-pooling can capture

deformable parts and fully connected

layers can capture complex co-occurrence

statistics

Bird recognition is one of the major examples in fine grained classification in the below image given a test image

groups of detected key points are used to compute multiple warped image regions that are aligned with prototypical models Each region is fed through a deep convolutional network and features are extracted from multiple layers after which they are concatenated and fed to a classifier

Figure 50 Bird Recognition Pipeline Overview Source Branson Van Hoen et al Bird Species Categorization

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

The below pictures and steps depict fine

grained classification approach for car

detection system

a Detects parts using a collection of

unsupervised part detectors

b Outputs a grid of discriminative

features (The CNN is learned with class

Car Detection System using Fine Grained Classification

labels and then truncated retaining

the first two convolutional layers

that retain spatial information) The

appearance of each part detected

using the learned CNN features is

described by pooling in the detected

region of each part

c Appearance of any undetected part

is set to zero This results in Ensemble

of Localized Learned Features (ELLF)

representation which is then used to

predict fine-grained object categories

d A standard CNN passes the output

of the convolutional layers through

several fully connected layers in order

to make a prediction

Convolutional Network are so far the

defacto and well accepted algorithms to

work with image based datasets They

work on the pixels of images using various

size filters (channels) by convolving using

pooling techniques to bubble the stronger

features to derive colors textures edges

and shapes and establish structures

through lower to highest layers

Given the face of a person CNN identifies

the face by establishing eyes ears

eyebrows lips chin etc components

of the face However if the facial image

is provided with incorrect position and

Capsule Network

alignment of eyes and eyebrows or say

eyebrows swaps with lips and ears are

placed on forehead the same CNN trained

algorithm would still go on and detect this

as a human face This is the huge drawback

of CNN algorithm and happens due to

its inability to store the information on

relative position of various objects

Capsule Network invented by Geoffery

Hinton addresses exactly this problem of

CNN by storing the spatial relationships of

various parts

Capsule Network like CNN are multi

layered neural networks consisting of

several capsules each capsule consists

of several neurons Capsules in lower

layers are called primary capsules and are

trained to detect an object (eg triangle

circle) within a given region of image It

outputs a vector that has two properties

Length and Orientation Length represents

the probability of the presence of the

object and Orientation represents the

pose parameters of the object such as

coordinates rotation angle etc

Capsules in higher layers called routing

capsules detect larger and more complex

objects such as eyes ears etc

Figure 60 Car Detection System Source Learning Features and Parts for Fine-Grained Recognition

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

Routing by Agreement

Advantage over CNN

Unlike CNN which primarily bubbles higher

order features using max or avg pooling

Capsule Network bubbles up features

using routing by agreement where every

capsule participates in choosing the shape

by voting (democratic election way)

In the figure given above

Lower level corresponds to rectangles

triangles and circles

High level corresponds to houses

boats and cars

If there is an image of a house the

capsules corresponding to rectangles

and triangles will have large activation

vectors Their relative positions (coded

in their instantiation parameters) will bet

on the presence of high-level objects

Since they will agree on the presence of

house the output vector of the house

capsule will become large This in turn

will make the predictions by the rectangle

and the triangle capsules larger This

cycle will repeat 4-5 times after which the

bets on the presence of a house will be

considerably larger than the bets on the

presence of a boat or a car

Less data for training - Capsule

Networks need very less data for

training (almost 10) as compared to

CNN

Fewer parameters The connections

between layers require fewer

parameters as capsule groups neurons

resulting in relatively less computations

bandwidth

Preserve pose and position - They

preserve pose and position information

as against CNN

High accuracy - Capsule Networks

have higher accuracy as compared to

CNNs

Reconstruction vs mere classification

- CNN helps you to classify the images

but not reconstruct the same image

whereas Capsule Networks help you to

reconstruct the exact image

Information retention vs loss - With

CNN during edge detection kernel

for edge detection works only on a

specific angle and each angle requires

a corresponding kernel When dealing

with edges CNN works well because

there are very few ways to describe

an edge Once we get up to the level

of shapes we do not want to have a

kernel for every angle of rectangles

ovals triangles and so on It would

get unwieldy and would become

even worse when dealing with more

complicated shapes that have 3

dimensional rotations and features like

lighting the reason why traditional

neural nets do not handle unseen

rotations effectively

Capsule Networks are best suited for

object detection and image segmentation

while it helps better model hierarchical

relationships and provides high accuracy

However Capsule Networks are still under

research and relatively new and mostly

tested and benchmarked on MNIST

dataset but they will be the future in

working with massive use cases emerging

from Vision datasets

Figure 70 A simple CapsNet with 3 layers This model gives comparable results to deep convolutional networks Source Dynamic

Routing Between Capsules Sara Sabour Nicholas Frosst Geoffrey E Hinton

Figure 80 Capsule Network for House or Boat classification Source Beginnersrsquo Guide to Capsule Networks

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

Traditional methods of learning in Machine

Learning focuses on taking a huge labeled

dataset and then learning to detect y

(dependent variable say classifying an

image as cat or dog) and given set of x

(independent variables images of cats

and dogs) This process involves selection

of an algorithm such as Convolution

Neural Net and arriving at various hyper

parameters such as number of layers

in the network number of neurons in

each layer learning rate weights bias

dropouts activation function to activate

the neuron such as sigmoid tanh and

Relu The learning happens through

several iterations of forward and backward

passes (propagation) by readjusting (also

called learning) the weights based on

difference in the loss (actual vs computed)

At the minimal loss the weights and

other network parameters are frozen

and are considered final model for future

prediction tasks This is obviously a long

and tedious process and repeating this for

every use case or task is engineering data

and compute intensive

Meta Learning focuses on how to learn to

learn It is one of the fascinating discipline

of artificial intelligence Human beings

have varying styles of learning Some

Humans can learn from their own existing

experiences or experiences they have

heard seen or observed Transfer Learning

discipline of AI is based on similar traits of

human learning where new models can

learn and benefit from existing trained

model

For example if a Computer Vision based

detection model with no Transfer

Learning that already detects various

types of vehicles such as cars trucks and

bicycles needs to be trained to detect an

airplane then you may have to retrain the

full model with images of all the previous

objects

Like the variety in human learning

techniques Meta Learning also uses

various learning methods based on

patterns of problems such as those based

on boundary space amount of data by

optimizing size of neural network or using

recurrent network approach Each of these

are briefly discussed inline

Few Shots Meta-Learning

This learning technique focuses on

learning from a few instances of data

Typically Neural Nets need millions of

data points to learn however Few Shots

Meta- Learning uses only a few instances

of data to build models Examples being

Facial recognition systems using Single

Shot Learning this is explained in detail in

Single Shot Learning section

Optimizer Meta-Learning

In this method the emphasis is on

optimizing the neural network and its

hyper- parameters A great example of

optimizer meta-learning are models that

are focused on improving gradient descent

techniques

Metric Meta-Learning

In this learning method the metric space

is narrowed down to improve the focus of

learning Then the learning is carried out

only in this metric space by leveraging

various optimization parameters that are

established for the given metric space

Recurrent Model Meta-Learning

This type of meta-learning model is tailored

to Recurrent Neural Networks(RNNs)

such as Long-Short-Term-Memory(LSTM)

In this architecture the meta-learner

algorithm will train a RNN model to process

a dataset sequentially and then process

new inputs from the task In an image

classification setting this might involve

passing in the set of (image label) pairs

of a dataset sequentially followed by new

examples which must be classified Meta-

Reinforcement Learning is an example of

this approach

Meta Learning

Transfer Learning (TL)

Types of Meta-Learning Models

people learn and memorize with one

instance of visual or auditory scan Some

people need multiple perspectives to

strengthen the neural connections for

permanent memory Some remember by

writing while some remember through

actual experiences Meta Learning tries

to leverage these to build its learning

characteristics

However with Transfer Learning you can

introduce an additional layer on top of the

existing pre-trained layer to start detecting

airplanes

Typically in a no Transfer Learning

scenario model needs to be trained and

during training right weights are arrived

at by doing many iterations (epochs) of

forward and back propagation which takes

significant amount of computation power

and time In addition Vision models need

significant amount of image data such as

in this example images of airplanes to be

trained

With Transfer Learning approach you can

reuse the existing pre-trained weights of an

existing trained model with significantly

less number of images ( 5 to 10 percent

of actual images needed for training

ground up model) for the model to start

detecting As the pre-trained model has

already learnt some basic learning around

identifying edges curves and shapes in the

earlier layers it needs to learn only higher

order features specific to airplanes with

the existing computed weights In brief

Transfer Learning helps eliminate the need

to learn anything from scratch

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

Transfer Learning helps in saving

significant amount of data computational

power and time in training new models as

they leverage pre-trained weights from the

existing trained models and architectures

However it is important to understand

that Transfer Learning approach today

is only matured enough to be applied to

similar use cases that is you cannot use

the above discussed model to train a facial

recognition model

Another key thing during Transfer Learning

is that it is important to understand the

details of the data on which new use cases

are being trained as it can implicitly push

the built-in biases from the underlying data

into newer systems It is recommended

that the datasheets of underlying models

and data be studied thoroughly unless the

usage is for experimentative purpose

Earlier having used the human brain

rationale it is important to note that

human brains have gone through centuries

of experiences and gene evolution and has

the ability to learn faster whereas transfer

learning is just a few decades old and is

becoming ground for new vision and text

use cases

External Document copy 2019 Infosys Limited

Figure 90 Transfer Learning Layers Source John Cherrie Training Deep Learning Models with Transfer Learning

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

Humans have the impressive skill to reason

about new concepts and experiences with

just a single example They have the ability

for one-shot generalization the aptitude

to encounter a new concept understand

its structure and then generate compelling

alternative variations of the same

Facial recognition systems are good

candidates for Single Shot Learning

otherwise needing ten thousands of

individual face images to train one neural

network can be extremely costly time

consuming and infeasible However a

Single Shot LearningSingle Shot Learning based system using

existing pre-trained FaceNet model and

facial encoding based approach on top of

it can be very effective to establish face

similarity by computing distance between

the faces

In this approach 128 bit encoding of each

face image is generated and compared

with other imagersquos encoding to determine

if the person is same or different

Various distance based algorithms such

as Euclidean distance can be used to

determine if they are within specified

threshold The model training approach

involves creating pairs of (Anchor Positive)

and (Anchor Negative) and training the

model in a way where (Anchor Positive)

pair distance difference is smaller and

(Anchor Negative) distance is farther

ldquoAnchorrdquo is the image of a person for whom

the recognition model needs to be trained

ldquoPositiverdquo is another image of the same

person

ldquoNegativerdquo is image of a different person

External Document copy 2019 Infosys Limited

Figure 100 Encoding approach inspired from ML Course from Coursera

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

This is a specialized Machine Learning

discipline where an agent learns to behave

in an environment by getting reward or

punishment for the actions performed The

agent can have an objective to maximize

short term or long-term rewards This

discipline uses deep learning techniques to

Value Based

In value-based RL the goal is to optimize

the value function V(s) Qtable uses any

The agent will use this value function to

select which state to choose at each step

Policy Based

In policy-based RL we want to directly

optimize the policy function π(s) without

using a value function

The policy is what defines the agent

behavior at a given time

Deep Reinforcement Learning (RL)

Three Approaches to Reinforcement Learning

bring in human level performance on the

given task

Deep Reinforcement Learning has found

significant relevance and application in

various game design systems such as

creating video games chess alpha Go

Atari as well as in industrial applications of

mathematical function to arrive at a state

based on action

The value of each state is the total amount

There are two types of policies

1 Deterministic A policy which at a given

state will always return the same action

2 Stochastic A policy that outputs a

distribution probability over actions

Value based and Policy based are more

conventional Reinforcement Learning

approaches They are useful for modeling

relatively simple systems

robots driverless car etc

In reinforcement learning policy p

controls what action we should take Value

function v measures how good it is to be

in a particular state The value function

tells us the maximum expected future

reward the agent will get at each state

of the reward an agent can expect to

accumulate over the future starting at that

state

State

State

Q value

Q value action 2

Q value action 1

Q value action 3

Qtable

Deep Q Neuralnetwork

Q learning

Deep Q learning

Action

ExpectedReward discounted

Given that state

action = policy(state)

Figure 110 Schema inspired by the Q learning notebook by Udacity

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

Model Based

In model-based RL we model the

environment This means we create a

model of the behavior of the environment

then this model is used to arrive at results

that maximises short term or long-term

rewards The model equation can be

any equation that is defined based on

the environments behavior and must be

sufficiently generalized to counter new

situations

When Model based approach uses Deep

Neural Network algorithms to sufficiently

well generalize and learn the complexities

of the environment to produce optimal

results it is called Deep Reinforcement

Learning The challenge with model based

approach is each environment needs a

dedicated trained model

AlphaGo was trained by using data from

several games to beat the human being

in the game of Go The training accuracy

was just 57 and still it was sufficient to

beat the human level performance The

training methods involved reinforcement

learning and deep learning to build a

policy network that tells what moves are

promising and a value network that tells

how good the board position is Searches

for the final move from these networks

is done using Monte Carlo Tree Search

(MCTS) algorithm Using supervised

learning a policy network was created to

imitate the expert moves

Deep Mind released AlphaGo Zero in late

2017 which beat AlphaGo and did not

involve any training from previous games

data to train deep network The deep

network training was done by picking

the training samples from AlphaGo and

AlphaGo Zero playing games against

itself and selecting best moves to train

the network and then applying those

in real games to improve the results

iteratively This is possible because deep

reinforcement learning algorithms can

store long-range tree search results for the

next best move in memory and do very

large computations that are difficult for a

human brain

Designing machine learning solution

involves several steps such as collecting

data understanding cleansing and

normalizing data doing feature

engineering selecting or designing

the algorithm selecting the model

architecture selecting and tuning modelrsquos

hyper-parameters evaluating modelrsquos

performance deploying and monitoring

the machine learning system in an online

system and so on Such machine learning

solution design requires an expert Data

Scientist to complete the pipeline

Auto ML (AML)As the complexity of these and other tasks

can easily get overwhelming the rapid

growth of machine learning applications

has created a demand for off-the-shelf

machine learning methods that can be

used easily and without expert knowledge

The AI research area that encompasses

progressive automation of machine

learning pipeline tasks is called AutoML

(Automatic Machine Learning)

Google CEO Sundar Pichai wrote

ldquoDesigning neural nets is extremely time

intensive and requires an expertise that

limits its use to a smaller community of

scientists and engineers Thatrsquos why wersquove

created an approach called AutoML

showing that itrsquos possible for neural nets

to design neural netsrdquo while Googlersquos

Head of AI Jeff Dean suggested that 100x

computational power could replace the

need for machine learning expertise

AutoML Vision relies on two core

techniques transfer learning and neural

architecture search

Xtrain Ytrain

Xtest budget

Han

d-cr

afte

d po

rtfo

lio Meta Learning

AutoML system

Build Ensemble Ytest Data

ProcessorFeature

Preprocessor

Bayesian Optimization

Classier

ML Pipeline

Figure 120 An example of Auto sklearn pipeline Source Andreacute Biedenkapp We did it Again World Champions in AutoML

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

Here is a look at the few libraries that help

in implementing AutoML

AUTO-SKLEARN

AUTO SKLEARN automates several key

tasks in Machine Learning pipeline such

as addressing column missing values

encoding of categorical values data scaling

and normalization feature pre-processing

and selection of right algorithm with

hyper-parameters The pipeline supports

15 Classification and 14 Feature processing

algorithms Selection of right algorithm can

happen based on ensembling techniques

and applying meta knowledge gathered

from executing similar scenarios (datasets

and algorithms)

Usage

Auto-sklearn is written in python and can

be considered as replacement for scikit-

learn classifiers Here is a sample set of

commands

gtgtgt import autosklearnclassification

Implementing AutoML gtgtgt cls = autosklearnclassification

AutoSklearnClassifier()

gtgtgt clsfit(X_train y_train)

gtgtgt predictions = clspredict(X_test

y_test)

SMAC (Sequential Model-Based

Algorithm Configuration)

SMAC is a tool for automating certain

AutoML steps SMAC is useful for selection

of key features hyper-parameter

optimization and to speed up algorithmic

outputs

BOHB (Bayesian Optimization

Hyperband searches)

BOHB combines Bayesian hyper parameter

optimization with bandit methods for

faster convergence

Google H2O also have their respective

AutoML tools which are not covered here

but can be explored in specific cases

AutoML needs significant memory and

computational power to execute alternate

algorithms and compute results At

present GPU resources are extremely

costly to execute even simple Machine

Learning workloads such as CNN algorithm

to classify objects If multiple such

alternate algorithms should be executed

the computation dollar needed would be

exponential This is impractical infeasible

and inefficient for the current state of Data

Science industry Adoption of AutoML will

depend on two things one the maturity

of AutoML pipeline and second but more

important how quickly GPU clusters

become cheap The second being most

critical Selling Cloud GPU capacity could

be one of the motivation of several cloud

based infrastructure-running companies

to promote AutoML in the industry Also

AutoML will not replace the Data scientistrsquos

work but can provide augmentation

and speed to certain tasks such as data

standardization model tuning and

trying multiple algorithms It is only the

beginning for AutoML but this technique

has high relevance and usefulness for

solving ultra-complex problems

External Document copy 2019 Infosys Limited

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

Neural Architecture Search (NAS) is a

component of AutoML and addresses

the important step of designing Neural

Network Architecture

Designing fresh Neural Net architecture

involves an expert establishing and

organizing Neural Network layers filters

or channels filter sizes selecting other

optimum Hyper parameters and so on

through several rounds of computational

Neural Architecture Search (NAS)

iterations Since AlexNet deep neural

network architecture won the ImageNet

(image classification based on ImageNet

dataset) competition in 2012 several

architecture styles such as VGG ResNet

Inception Xception InceptionResNet

MobileNet and NASNet have significantly

evolved However selection of the right

architecture for the right problem is also

a skill due to the presence of various

influencers such as applicability to the

problem accuracy number of parameters

memory and computational footprint and

size of the architecture that govern the

overall functioning efficiency

Neural Architecture Search tries to address

this problem space by automatically

selecting right Neural Network architecture

to solve a given problem

AutoML

HyperparameterOptimization

NAS

External Document copy 2019 Infosys Limited

Figure 130 Source Liam Li Ameet Talwalkar What is neural architecture search

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

Search space The search space provides

boundary within which the specific

architecture needs to be searched

Computer Vision (captioning the scene

or product identification) based use

cases would need a different neural

network architecture style as against

Speech (speech transcription or speaker

classification) or unstructured Text (Topic

extraction intent mining) based use cases

Search space tries to provide available

catalogs of best in class architectures based

on other domain data and performance

Key Components of NAS

These are also usually hand crafted by

expert data scientists

Optimization method This is responsible

for providing mechanism to search the

best architecture It could be searched

and applied randomly or using certain

statistical or Machine Learning evaluation

approach such as Bayesian method or

reinforcement learning methods

Evaluation method This has the role

of evaluating the quality of architecture

considered by optimization method It

could be done using full training approach

or doing partial training and then applying

certain specialized methods such as partial

training or early stopping weights sharing

network morphism etc

For selective problem spaces as

compared to manual methods NAS have

outperformed and is showing definite

promise for future However it is still

evolving and not ready for production

usages as several architectures need to be

established and evaluated depending on

the problem space

Search Space

DAG Representation

Cell Block

Meta-Architecture

NAS Specic

Reinforcement Learning

Evolutionary Search

Gradient-Based Optimization

BayesianOptimization

Optimization Method

Components of NAS

Full Training

Partial Training

Weight-Sharing

Network Morphism

Hypernetworks

Evaluation Method

Figure 140 Components of NAS Source Liam Li Ameet Talwalkar What is neural architecture search

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

In this paper we looked at some key H3 AI

areas by no means this is an exhaustive

list Amongst all discussed Transfer

Learning Capsule Networks Explainable

AI Generative AI are making interesting

Addressing H3 AI Trends at Infosys

things possible and looks highly promising

We are keenly experimenting with these

building early use cases and integrating

into our product stack Infosys Enterprise

Cognitive platform (iECP) to solve

interesting client problems Here is a look

at how we are employing these H3 trends

in the work we do

Trend Use cases

1

2

3

4

5

6

7

8

9

10

Explainable AI (XAI)

Generative AI Neural Style Transfer (NST)

Fine Grained Classication

Capsule Networks

Meta Learning

Transfer Learning

Single Shot Learning

Deep Reinforcement Learning (RL)

Auto ML

Neural Architecture Search (NAS)

Applicable across where results need to be traced eg Tumor Detection Mortgage Rejection Candidate Selection etc

Art Generation Sketch Generation Image or Video Resolution Improvements Data GenerationAugmentation Music Generation

Vehicle Classication Type of Tumor Detection

Image Re-constructionImage ComparisonMatching

Intelligent Agents Continuous Learning scenarios for document review and corrections

Identifying person not wearing helmet Logobrand detection in the image Speech Model training for various accents vocabularies

Face Recognition Face Verication

Intelligent Agents Robots Driverless cars Trac Light Monitoring Continuous Learning scenarios for document review and corrections

Invoice Attribute Extraction Document Classication Document Clustering

CNN or RNN based use cases such as Image Classication Object Identication Image Segmentation Speaker Classication etc

External Document copy 2019 Infosys Limited

Table 20 AI Use cases Infosys Research

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

1 Explainable AI (XAI)

2 Fine Grained Classification

5 Transfer Learning

6 Single Shot Learning

3 Capsule Networks

4 Meta Learning

7 Deep Reinforcement Learning (RL)

8 Auto ML

bull httpschristophmgithubiointerpretable-ml-book

bull httpssimmachinescomexplainable-ai

bull httpswwwcmuedunewsstoriesarchives2018octoberexplainable-aihtml

bull httpsmediumcomQuantumBlackmaking-ai-human-again-the-importance-of-explainable-ai-xai-95d347ccbb1c

bull httpstowardsdatasciencecomexplainable-artificial-intelligence-part-2-model-interpretation-strategies-75d4afa6b739

bull httpsvisioncornelleduse3wp-contentuploads201502BMVC14pdf

bull httpswwwfastai20180723auto-ml-3

bull httpsarxivorgpdf160305106pdf

bull httpsarxivorgpdf171009829pdf

bull httpskerasioexamplescifar10_cnn_capsule

bull httpswwwyoutubecomwatchv=pPN8d0E3900

bull httpswwwyoutubecomwatchv=rTawFwUvnLE

bull httpsmediumfreecodecamporgunderstanding-capsule-networks-ais-alluring-new-architecture-bdb228173ddc

bull httpsmediumcomjrodthoughtswhats-new-in-deep-learning-research-openai-s-reptile-makes-it-easier-to-learn-how-to-learn-e0f6651a39f0

bull httpproceedingsmlrpressv48santoro16pdf

bull httpstowardsdatasciencecomwhats-new-in-deep-learning-research-understanding-meta-learning-91fef1295660

bull httpsdeepmindcomblogarticledeep-reinforcement-learning

bull httpsmediumfreecodecamporgan-introduction-to-reinforcement-learning-4339519de419

bull httpsmediumcomjonathan_huialphago-zero-a-game-changer-14ef6e45eba5

bull httpsarxivorgpdf181112560pdf

bull httpswwwml4aadorgautomated-algorithm-designalgorithm-configurationsmac

bull httpswwwfastai20180723auto-ml-3

bull httpswwwfastai20180716auto-ml2auto-ml

bull httpscompetitionscodalaborgcompetitions17767

bull httpswwwautomlorgautomlauto-sklearn

bull httpswwwml4aadorgautomated-algorithm-designalgorithm-configurationsmac

bull httpsautomlgithubioHpBandSterbuildhtmloptimizersbohbhtml

Reference

copy 2019 Infosys Limited Bengaluru India All Rights Reserved Infosys believes the information in this document is accurate as of its publication date such information is subject to change without notice Infosys acknowledges the proprietary rights of other companies to the trademarks product names and such other intellectual property rights mentioned in this document Except as expressly permitted neither this documentation nor any part of it may be reproduced stored in a retrieval system or transmitted in any form or by any means electronic mechanical printing photocopying recording or otherwise without the prior permission of Infosys Limited and or any named intellectual property rights holders under this document

For more information contact askusinfosyscom

Infosyscom | NYSE INFY Stay Connected

9 Neural Architecture Search (NAS)

10 Infosys Enterprise Cognitive Platform

bull httpswwworeillycomideaswhat-is-neural-architecture-search

bull httpswwwinfosyscomservicesincubating-emerging-technologiesofferingsPagesenterprise-cognitive-platformaspx

Sudhanshu Hate is inventor and architect of Infosys Enterprise Cognitive Platform (iECP)

a microservices API based Artificial Intelligence platform He has over 21 years of experience

in creating products solutions and working with clients on industry problems His current

areas of interests are Computer Vision Speech and Unstructured Text based AI possibilities

To know more about our work on the H3 trends in AI write to icetsinfosyscom

About the author

Page 3: H3 Trends in AI Algorithms: The Infosys Way...• Sentiment Analysis Figure 1.0: ... from use cases such as language translations, sentence formulation, text summarization, topic extraction

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

H1 of AI ldquocore offeringsrdquo are typically defined as algorithm powered use cases that have become mainstream and will remain major investment areas for the current wave In that respect adoption of use cases such as product or customer recommendations churn and sentiment analysis and leveraging algorithms such as Random Forest Support Vector Machines (SVM) Naiumlve Bayes and n-grams based approaches have been mainstream for some time and will continue to get weaved into varied AI experiences

H2 of AI ldquonew offeringsrdquo use cases are the ones that are currently in experimentative evolutionary mode and will have major impact on Artificial Intelligent systems

that will be mainstream in second wave Convolution Neural Networks (CNN) have laid foundation for several art of possible Computer Vision use cases ranging from object detection image captioning and segmentation to facial recognition Long Term Short Term Memory(LSTM) and Recurrent Neural Nets (RNN) are helping to significantly improve art of possibilities from use cases such as language translations sentence formulation text summarization topic extraction and so on Word vectors based models such as GloVe and Word2Vec are helping in dealing with large multi dimensional text corpuses and finding hidden unspotted complex interwoven relationships and similarity between topics entities and keywords

These H2 AI algorithms are promising interesting new possibilities in various business functions however it is still in nascent stage of adoption and user testing

H3 of AI ldquoemerging offeringsrdquo use cases are the ones that are potential game changers and can unearth new possibilities from AI that are unexplored and unimagined today As these technologies are relatively new it requires more time to establish its weaknesses strengths and nuances In this paper we look at key H3 AI algorithmic trends and how we leverage these in various use cases built as part of our IP Infosys Enterprise Cognitive platform (iECP)

Horizon 1(Mainstream)

Horizon 2(Adopt Scale)

Horizon 3(Envision Invent Disrupt)

bull Explainable AIbull Generative Networksbull Fine Grained Classicationbull Capsule Networksbull Meta Learningbull Transfer Learning (Text)bull Single Shot Learningbull Reinforcement Learningbull Auto MLbull Neural Architecture Search

(NAS)

bull Convolution Neural Networks (CNN)

bull Long Term Short Term Memory (LSTM)

bull Recurrent Neural Networks (RNN)

bull Word2Vecbull GloVebull Transfer Learning (Vision)

bull Logistic Regressionbull Naive Bayesbull Random Forestbull Support Vector Machines

(SVM)bull Collaborative Filteringbull n-grams

Algorithms

Use Cases

bull Scene Captioningbull Scene Detectionbull Store Footfall Countsbull Specic Object Class

Detectionbull Sentence Completionbull Video Scene Predictionbull Auto Learningbull Fake Images Art Generationbull Music Generationbull Data Augmentation

bull Object Detectionbull Face Recognitionbull Product Brand Recognition

Classicationbull Speech Recognitionbull Sentence Completionbull Speech Transcriptionsbull Topic Classicationbull Topic Extractionbull Intent Miningbull Question Extraction

bull Recommendationsbull Predictionbull Document Image

Classicationbull Document Image Clusteringbull Sentiment Analysisbull Named Entity Recognition

(NER)bull Keyword Extractions

Table 10 H3 Algorithms and Usecases- Infosys Research

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

Neural Network algorithms are considered

to derive hidden patterns from data that

many other conventional best of breed

Machine Learning algorithms such as

Support Vector Machines Random Forrest

Naiumlve Bayes etc are unable to establish

However there is an increasing rate of

incorrect and unexplainable decisions

and results produced by the Neural

Network algorithms in activities such

as credit lending skilled job hiring and

facial recognition Given this scenario

AI results should be justified explained

and reproduced for consistency and

correctness as some of these results can

Network dissection helps in associating

these established units to concepts

They learn from labeled concepts during

supervised training stages and how and in

what magnitude these are influenced by

channel activations

Several frameworks are currently evolving

to improve the explainability of the

models Two known frameworks in this

space are LIME and SHAP

Explainable AI (XAI)

have profound impact on livelihood

Geoffrey Hinton (University of Toronto)

often called the godfather of deep

learning explains ldquoA deep-learning system

doesnrsquot have any explanatory power The

more powerful the deep-learning system

becomes the more opaque it can becomerdquo

It is to address the issues of transparency

in AI that Explainable AI was developed

Explainable AI (XAI) as a framework

increases the transparency of black-box

algorithms by providing explanations for

the predictions made and can accurately

explain a prediction at the individual level

LIME ( Local Interpretability Model

agnostic Explanations) It treats the

model as a blackbox and tries to create

another surrogate non-linear model where

explainablitiy is supported or feasible

such as SVM Random Forest or Logistic

Regression The surrogate non-linear

model is then used to evaluate different

components of the image by perturbing

the inputs and evaluating its impact

on the result Thereby deciding which

Here are a few approaches provided

through certain frameworks that can help

understand the traceability of results

Feature Visualization as depicted in figure

below helps in visualizing various layers in

a neural network They help establish that

lower layers are useful in learning features

such as edges and textures whereas

higher layer provides more of higher order

abstract concepts such as objects

parts of the image are most important in

arriving at results Since the original model

does not participate directly it is model

independent The challenge with this

approach is that even when the surrogate

model based explanations can be relevant

to the model it is used on it may not be

generalizable precisely or become one to

one mappable to the original model all the

time

External Document copy 2019 Infosys Limited

Edges Textures Patterns Parts Objects

Figure 20 Feature Visualisation Source Olah et al 2017 (CC-BY 40)

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

Generative AI will have a potentially

strong role in creative work be it writing

articles creating completely new images

from the existing set of trained models

improving image or video quality merging

images for artistic creations in creating

music or improving dataset through data

generation Generative AI as it matures in

near term will augment many jobs and will

potentially replace many in future

Generative Networks consists of two deep

neural networks a generative network

and a discriminative network They work

together to provide high-level simulation

of conceptual tasks

To train a Generative model we first

collect a large amount of data in some

Generative AI

Steps

Neural Style Transfer (NST)

1 Create set of noisy (perturbed) example

images by disabling certain features

(marking certain portions gray)

2 For each example get the probability

that tree frog is in the image as per

original model

3 Using these created data points train a

domain (eg think millions of images

sentences or sounds etc) and then

train the model to generate similar

data Generative network generates the

data to fool the Discriminative Network

while Discriminative Network learns by

identifying real vs fake data received from

the Generative Network

Generator trains with an objective function

on whether it can fool the discriminator

network whereas discriminator trains on

its ability to not be fooled and correctly

identify real vs fake Both network

learns through back propagation The

generator is typically a deconvolutional

neural network and the discriminator is a

convolutional neural network

Generative Networks can be of multiple

types depending on the objective they are

designed for example being

Figure 30 Explaining a Prediction with LIME Source Pol Ferrando Understanding how LIME explains predictions

Neural Style Transfer (NST) is one of the

Generative AI techniques in deep learning

As seen below it merges two images

namely a content image (C) and a style

image (S) to create a generated image

(G) The generated image G combines the

content of the image C with the style of

image S

simple linear model (Logistic regression

etc) and get the results

4 Superpixels with highest positive

weights becomes an explanation

SHAP (SHapley Additive exPlanations)

It uses a game theory based approach

to predict the outcome by using various

permutations and combinations of features

and their effect on the delta of the result

(predicted - actual) and then computing

average of the score for that feature to

explain the results For image use cases

it marks the dominating feature areas by

coloring the pixels in the image

SHAP produces relatively accurate results

and is more widely used in Explainable AI

as against LIME

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

Some of the other GAN variations that are

popular are

Super Resolution GAN (SRGAN) that

helps improve quality of images

Stack-GAN that generates realistic

looking photographs from textual

descriptions of simple objects like birds

and flowers

Sketch-GAN a Generative model for

vector drawings which is a Recurrent

Neural Network (RNN) and is able to

construct stroke-based drawings of

common objects The model is trained

on a dataset of human-drawn images

representing many different classes

eGANs (Evolutionary Generative

Adversarial Networks) that generate

photographs of faces with different

ages from young to old

IcGAN to reconstruct photographs

of faces with specific features such

as changes in hair color style facial

expression and even gender

Content image

Content image

Content image

Colorful circle

Louvre museum

Ancient city of Persepolis

Blue painting

Impressionist style painting

Colorful circle with blue painting style

Louvre painting with impressionist style

Persepolis in Van Gogh style

The Starry Night (Van Gogh)

Style image

Style image

Style image

Generated image

Generated image

Generated image

External Document copy 2019 Infosys Limited

Figure 40 Novel Artistic Images through Neural Style Transfer Source Fisseha Berhane Deep Learning amp Art Neural Style Transfer

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

Classification of an object into specific

categories such as car table flower and

such are common in Computer Vision

However establishing the objectsrsquo finer

class based on specific characteristics is

where AI is making rapid progress This is

because granular features of objects are

being trained and used for differentiation

of objects

Examples of Fine Grained Classification are

In Fine Grained Classification the

progression through the 8-layer CNN

network can be thought of as a progression

from low to mid to high-level features

The later layers aggregate more complex

structural information across larger

scalesndashsequences of convolutional layers

Fine Grained Classification Fine grained clothing style finder type

of a shoe etc

Recognizing a car type

Recognizing breed of a dog plant

species insect bird species etc

However fine-grained classification is

challenging due to the difficulty of finding

discriminative features Finding those

subtle traits that fully characterize the

object is not straightforward

Feature representations that better

preserve fine-grained information

Segmentation-based approaches that

facilitate extraction of purer features

and partpose normalized feature

spaces

Pose Normalization Schemes

Fine Grained Classification Approaches

interleaved with max-pooling can capture

deformable parts and fully connected

layers can capture complex co-occurrence

statistics

Bird recognition is one of the major examples in fine grained classification in the below image given a test image

groups of detected key points are used to compute multiple warped image regions that are aligned with prototypical models Each region is fed through a deep convolutional network and features are extracted from multiple layers after which they are concatenated and fed to a classifier

Figure 50 Bird Recognition Pipeline Overview Source Branson Van Hoen et al Bird Species Categorization

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

The below pictures and steps depict fine

grained classification approach for car

detection system

a Detects parts using a collection of

unsupervised part detectors

b Outputs a grid of discriminative

features (The CNN is learned with class

Car Detection System using Fine Grained Classification

labels and then truncated retaining

the first two convolutional layers

that retain spatial information) The

appearance of each part detected

using the learned CNN features is

described by pooling in the detected

region of each part

c Appearance of any undetected part

is set to zero This results in Ensemble

of Localized Learned Features (ELLF)

representation which is then used to

predict fine-grained object categories

d A standard CNN passes the output

of the convolutional layers through

several fully connected layers in order

to make a prediction

Convolutional Network are so far the

defacto and well accepted algorithms to

work with image based datasets They

work on the pixels of images using various

size filters (channels) by convolving using

pooling techniques to bubble the stronger

features to derive colors textures edges

and shapes and establish structures

through lower to highest layers

Given the face of a person CNN identifies

the face by establishing eyes ears

eyebrows lips chin etc components

of the face However if the facial image

is provided with incorrect position and

Capsule Network

alignment of eyes and eyebrows or say

eyebrows swaps with lips and ears are

placed on forehead the same CNN trained

algorithm would still go on and detect this

as a human face This is the huge drawback

of CNN algorithm and happens due to

its inability to store the information on

relative position of various objects

Capsule Network invented by Geoffery

Hinton addresses exactly this problem of

CNN by storing the spatial relationships of

various parts

Capsule Network like CNN are multi

layered neural networks consisting of

several capsules each capsule consists

of several neurons Capsules in lower

layers are called primary capsules and are

trained to detect an object (eg triangle

circle) within a given region of image It

outputs a vector that has two properties

Length and Orientation Length represents

the probability of the presence of the

object and Orientation represents the

pose parameters of the object such as

coordinates rotation angle etc

Capsules in higher layers called routing

capsules detect larger and more complex

objects such as eyes ears etc

Figure 60 Car Detection System Source Learning Features and Parts for Fine-Grained Recognition

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

Routing by Agreement

Advantage over CNN

Unlike CNN which primarily bubbles higher

order features using max or avg pooling

Capsule Network bubbles up features

using routing by agreement where every

capsule participates in choosing the shape

by voting (democratic election way)

In the figure given above

Lower level corresponds to rectangles

triangles and circles

High level corresponds to houses

boats and cars

If there is an image of a house the

capsules corresponding to rectangles

and triangles will have large activation

vectors Their relative positions (coded

in their instantiation parameters) will bet

on the presence of high-level objects

Since they will agree on the presence of

house the output vector of the house

capsule will become large This in turn

will make the predictions by the rectangle

and the triangle capsules larger This

cycle will repeat 4-5 times after which the

bets on the presence of a house will be

considerably larger than the bets on the

presence of a boat or a car

Less data for training - Capsule

Networks need very less data for

training (almost 10) as compared to

CNN

Fewer parameters The connections

between layers require fewer

parameters as capsule groups neurons

resulting in relatively less computations

bandwidth

Preserve pose and position - They

preserve pose and position information

as against CNN

High accuracy - Capsule Networks

have higher accuracy as compared to

CNNs

Reconstruction vs mere classification

- CNN helps you to classify the images

but not reconstruct the same image

whereas Capsule Networks help you to

reconstruct the exact image

Information retention vs loss - With

CNN during edge detection kernel

for edge detection works only on a

specific angle and each angle requires

a corresponding kernel When dealing

with edges CNN works well because

there are very few ways to describe

an edge Once we get up to the level

of shapes we do not want to have a

kernel for every angle of rectangles

ovals triangles and so on It would

get unwieldy and would become

even worse when dealing with more

complicated shapes that have 3

dimensional rotations and features like

lighting the reason why traditional

neural nets do not handle unseen

rotations effectively

Capsule Networks are best suited for

object detection and image segmentation

while it helps better model hierarchical

relationships and provides high accuracy

However Capsule Networks are still under

research and relatively new and mostly

tested and benchmarked on MNIST

dataset but they will be the future in

working with massive use cases emerging

from Vision datasets

Figure 70 A simple CapsNet with 3 layers This model gives comparable results to deep convolutional networks Source Dynamic

Routing Between Capsules Sara Sabour Nicholas Frosst Geoffrey E Hinton

Figure 80 Capsule Network for House or Boat classification Source Beginnersrsquo Guide to Capsule Networks

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

Traditional methods of learning in Machine

Learning focuses on taking a huge labeled

dataset and then learning to detect y

(dependent variable say classifying an

image as cat or dog) and given set of x

(independent variables images of cats

and dogs) This process involves selection

of an algorithm such as Convolution

Neural Net and arriving at various hyper

parameters such as number of layers

in the network number of neurons in

each layer learning rate weights bias

dropouts activation function to activate

the neuron such as sigmoid tanh and

Relu The learning happens through

several iterations of forward and backward

passes (propagation) by readjusting (also

called learning) the weights based on

difference in the loss (actual vs computed)

At the minimal loss the weights and

other network parameters are frozen

and are considered final model for future

prediction tasks This is obviously a long

and tedious process and repeating this for

every use case or task is engineering data

and compute intensive

Meta Learning focuses on how to learn to

learn It is one of the fascinating discipline

of artificial intelligence Human beings

have varying styles of learning Some

Humans can learn from their own existing

experiences or experiences they have

heard seen or observed Transfer Learning

discipline of AI is based on similar traits of

human learning where new models can

learn and benefit from existing trained

model

For example if a Computer Vision based

detection model with no Transfer

Learning that already detects various

types of vehicles such as cars trucks and

bicycles needs to be trained to detect an

airplane then you may have to retrain the

full model with images of all the previous

objects

Like the variety in human learning

techniques Meta Learning also uses

various learning methods based on

patterns of problems such as those based

on boundary space amount of data by

optimizing size of neural network or using

recurrent network approach Each of these

are briefly discussed inline

Few Shots Meta-Learning

This learning technique focuses on

learning from a few instances of data

Typically Neural Nets need millions of

data points to learn however Few Shots

Meta- Learning uses only a few instances

of data to build models Examples being

Facial recognition systems using Single

Shot Learning this is explained in detail in

Single Shot Learning section

Optimizer Meta-Learning

In this method the emphasis is on

optimizing the neural network and its

hyper- parameters A great example of

optimizer meta-learning are models that

are focused on improving gradient descent

techniques

Metric Meta-Learning

In this learning method the metric space

is narrowed down to improve the focus of

learning Then the learning is carried out

only in this metric space by leveraging

various optimization parameters that are

established for the given metric space

Recurrent Model Meta-Learning

This type of meta-learning model is tailored

to Recurrent Neural Networks(RNNs)

such as Long-Short-Term-Memory(LSTM)

In this architecture the meta-learner

algorithm will train a RNN model to process

a dataset sequentially and then process

new inputs from the task In an image

classification setting this might involve

passing in the set of (image label) pairs

of a dataset sequentially followed by new

examples which must be classified Meta-

Reinforcement Learning is an example of

this approach

Meta Learning

Transfer Learning (TL)

Types of Meta-Learning Models

people learn and memorize with one

instance of visual or auditory scan Some

people need multiple perspectives to

strengthen the neural connections for

permanent memory Some remember by

writing while some remember through

actual experiences Meta Learning tries

to leverage these to build its learning

characteristics

However with Transfer Learning you can

introduce an additional layer on top of the

existing pre-trained layer to start detecting

airplanes

Typically in a no Transfer Learning

scenario model needs to be trained and

during training right weights are arrived

at by doing many iterations (epochs) of

forward and back propagation which takes

significant amount of computation power

and time In addition Vision models need

significant amount of image data such as

in this example images of airplanes to be

trained

With Transfer Learning approach you can

reuse the existing pre-trained weights of an

existing trained model with significantly

less number of images ( 5 to 10 percent

of actual images needed for training

ground up model) for the model to start

detecting As the pre-trained model has

already learnt some basic learning around

identifying edges curves and shapes in the

earlier layers it needs to learn only higher

order features specific to airplanes with

the existing computed weights In brief

Transfer Learning helps eliminate the need

to learn anything from scratch

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

Transfer Learning helps in saving

significant amount of data computational

power and time in training new models as

they leverage pre-trained weights from the

existing trained models and architectures

However it is important to understand

that Transfer Learning approach today

is only matured enough to be applied to

similar use cases that is you cannot use

the above discussed model to train a facial

recognition model

Another key thing during Transfer Learning

is that it is important to understand the

details of the data on which new use cases

are being trained as it can implicitly push

the built-in biases from the underlying data

into newer systems It is recommended

that the datasheets of underlying models

and data be studied thoroughly unless the

usage is for experimentative purpose

Earlier having used the human brain

rationale it is important to note that

human brains have gone through centuries

of experiences and gene evolution and has

the ability to learn faster whereas transfer

learning is just a few decades old and is

becoming ground for new vision and text

use cases

External Document copy 2019 Infosys Limited

Figure 90 Transfer Learning Layers Source John Cherrie Training Deep Learning Models with Transfer Learning

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

Humans have the impressive skill to reason

about new concepts and experiences with

just a single example They have the ability

for one-shot generalization the aptitude

to encounter a new concept understand

its structure and then generate compelling

alternative variations of the same

Facial recognition systems are good

candidates for Single Shot Learning

otherwise needing ten thousands of

individual face images to train one neural

network can be extremely costly time

consuming and infeasible However a

Single Shot LearningSingle Shot Learning based system using

existing pre-trained FaceNet model and

facial encoding based approach on top of

it can be very effective to establish face

similarity by computing distance between

the faces

In this approach 128 bit encoding of each

face image is generated and compared

with other imagersquos encoding to determine

if the person is same or different

Various distance based algorithms such

as Euclidean distance can be used to

determine if they are within specified

threshold The model training approach

involves creating pairs of (Anchor Positive)

and (Anchor Negative) and training the

model in a way where (Anchor Positive)

pair distance difference is smaller and

(Anchor Negative) distance is farther

ldquoAnchorrdquo is the image of a person for whom

the recognition model needs to be trained

ldquoPositiverdquo is another image of the same

person

ldquoNegativerdquo is image of a different person

External Document copy 2019 Infosys Limited

Figure 100 Encoding approach inspired from ML Course from Coursera

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

This is a specialized Machine Learning

discipline where an agent learns to behave

in an environment by getting reward or

punishment for the actions performed The

agent can have an objective to maximize

short term or long-term rewards This

discipline uses deep learning techniques to

Value Based

In value-based RL the goal is to optimize

the value function V(s) Qtable uses any

The agent will use this value function to

select which state to choose at each step

Policy Based

In policy-based RL we want to directly

optimize the policy function π(s) without

using a value function

The policy is what defines the agent

behavior at a given time

Deep Reinforcement Learning (RL)

Three Approaches to Reinforcement Learning

bring in human level performance on the

given task

Deep Reinforcement Learning has found

significant relevance and application in

various game design systems such as

creating video games chess alpha Go

Atari as well as in industrial applications of

mathematical function to arrive at a state

based on action

The value of each state is the total amount

There are two types of policies

1 Deterministic A policy which at a given

state will always return the same action

2 Stochastic A policy that outputs a

distribution probability over actions

Value based and Policy based are more

conventional Reinforcement Learning

approaches They are useful for modeling

relatively simple systems

robots driverless car etc

In reinforcement learning policy p

controls what action we should take Value

function v measures how good it is to be

in a particular state The value function

tells us the maximum expected future

reward the agent will get at each state

of the reward an agent can expect to

accumulate over the future starting at that

state

State

State

Q value

Q value action 2

Q value action 1

Q value action 3

Qtable

Deep Q Neuralnetwork

Q learning

Deep Q learning

Action

ExpectedReward discounted

Given that state

action = policy(state)

Figure 110 Schema inspired by the Q learning notebook by Udacity

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

Model Based

In model-based RL we model the

environment This means we create a

model of the behavior of the environment

then this model is used to arrive at results

that maximises short term or long-term

rewards The model equation can be

any equation that is defined based on

the environments behavior and must be

sufficiently generalized to counter new

situations

When Model based approach uses Deep

Neural Network algorithms to sufficiently

well generalize and learn the complexities

of the environment to produce optimal

results it is called Deep Reinforcement

Learning The challenge with model based

approach is each environment needs a

dedicated trained model

AlphaGo was trained by using data from

several games to beat the human being

in the game of Go The training accuracy

was just 57 and still it was sufficient to

beat the human level performance The

training methods involved reinforcement

learning and deep learning to build a

policy network that tells what moves are

promising and a value network that tells

how good the board position is Searches

for the final move from these networks

is done using Monte Carlo Tree Search

(MCTS) algorithm Using supervised

learning a policy network was created to

imitate the expert moves

Deep Mind released AlphaGo Zero in late

2017 which beat AlphaGo and did not

involve any training from previous games

data to train deep network The deep

network training was done by picking

the training samples from AlphaGo and

AlphaGo Zero playing games against

itself and selecting best moves to train

the network and then applying those

in real games to improve the results

iteratively This is possible because deep

reinforcement learning algorithms can

store long-range tree search results for the

next best move in memory and do very

large computations that are difficult for a

human brain

Designing machine learning solution

involves several steps such as collecting

data understanding cleansing and

normalizing data doing feature

engineering selecting or designing

the algorithm selecting the model

architecture selecting and tuning modelrsquos

hyper-parameters evaluating modelrsquos

performance deploying and monitoring

the machine learning system in an online

system and so on Such machine learning

solution design requires an expert Data

Scientist to complete the pipeline

Auto ML (AML)As the complexity of these and other tasks

can easily get overwhelming the rapid

growth of machine learning applications

has created a demand for off-the-shelf

machine learning methods that can be

used easily and without expert knowledge

The AI research area that encompasses

progressive automation of machine

learning pipeline tasks is called AutoML

(Automatic Machine Learning)

Google CEO Sundar Pichai wrote

ldquoDesigning neural nets is extremely time

intensive and requires an expertise that

limits its use to a smaller community of

scientists and engineers Thatrsquos why wersquove

created an approach called AutoML

showing that itrsquos possible for neural nets

to design neural netsrdquo while Googlersquos

Head of AI Jeff Dean suggested that 100x

computational power could replace the

need for machine learning expertise

AutoML Vision relies on two core

techniques transfer learning and neural

architecture search

Xtrain Ytrain

Xtest budget

Han

d-cr

afte

d po

rtfo

lio Meta Learning

AutoML system

Build Ensemble Ytest Data

ProcessorFeature

Preprocessor

Bayesian Optimization

Classier

ML Pipeline

Figure 120 An example of Auto sklearn pipeline Source Andreacute Biedenkapp We did it Again World Champions in AutoML

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

Here is a look at the few libraries that help

in implementing AutoML

AUTO-SKLEARN

AUTO SKLEARN automates several key

tasks in Machine Learning pipeline such

as addressing column missing values

encoding of categorical values data scaling

and normalization feature pre-processing

and selection of right algorithm with

hyper-parameters The pipeline supports

15 Classification and 14 Feature processing

algorithms Selection of right algorithm can

happen based on ensembling techniques

and applying meta knowledge gathered

from executing similar scenarios (datasets

and algorithms)

Usage

Auto-sklearn is written in python and can

be considered as replacement for scikit-

learn classifiers Here is a sample set of

commands

gtgtgt import autosklearnclassification

Implementing AutoML gtgtgt cls = autosklearnclassification

AutoSklearnClassifier()

gtgtgt clsfit(X_train y_train)

gtgtgt predictions = clspredict(X_test

y_test)

SMAC (Sequential Model-Based

Algorithm Configuration)

SMAC is a tool for automating certain

AutoML steps SMAC is useful for selection

of key features hyper-parameter

optimization and to speed up algorithmic

outputs

BOHB (Bayesian Optimization

Hyperband searches)

BOHB combines Bayesian hyper parameter

optimization with bandit methods for

faster convergence

Google H2O also have their respective

AutoML tools which are not covered here

but can be explored in specific cases

AutoML needs significant memory and

computational power to execute alternate

algorithms and compute results At

present GPU resources are extremely

costly to execute even simple Machine

Learning workloads such as CNN algorithm

to classify objects If multiple such

alternate algorithms should be executed

the computation dollar needed would be

exponential This is impractical infeasible

and inefficient for the current state of Data

Science industry Adoption of AutoML will

depend on two things one the maturity

of AutoML pipeline and second but more

important how quickly GPU clusters

become cheap The second being most

critical Selling Cloud GPU capacity could

be one of the motivation of several cloud

based infrastructure-running companies

to promote AutoML in the industry Also

AutoML will not replace the Data scientistrsquos

work but can provide augmentation

and speed to certain tasks such as data

standardization model tuning and

trying multiple algorithms It is only the

beginning for AutoML but this technique

has high relevance and usefulness for

solving ultra-complex problems

External Document copy 2019 Infosys Limited

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

Neural Architecture Search (NAS) is a

component of AutoML and addresses

the important step of designing Neural

Network Architecture

Designing fresh Neural Net architecture

involves an expert establishing and

organizing Neural Network layers filters

or channels filter sizes selecting other

optimum Hyper parameters and so on

through several rounds of computational

Neural Architecture Search (NAS)

iterations Since AlexNet deep neural

network architecture won the ImageNet

(image classification based on ImageNet

dataset) competition in 2012 several

architecture styles such as VGG ResNet

Inception Xception InceptionResNet

MobileNet and NASNet have significantly

evolved However selection of the right

architecture for the right problem is also

a skill due to the presence of various

influencers such as applicability to the

problem accuracy number of parameters

memory and computational footprint and

size of the architecture that govern the

overall functioning efficiency

Neural Architecture Search tries to address

this problem space by automatically

selecting right Neural Network architecture

to solve a given problem

AutoML

HyperparameterOptimization

NAS

External Document copy 2019 Infosys Limited

Figure 130 Source Liam Li Ameet Talwalkar What is neural architecture search

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

Search space The search space provides

boundary within which the specific

architecture needs to be searched

Computer Vision (captioning the scene

or product identification) based use

cases would need a different neural

network architecture style as against

Speech (speech transcription or speaker

classification) or unstructured Text (Topic

extraction intent mining) based use cases

Search space tries to provide available

catalogs of best in class architectures based

on other domain data and performance

Key Components of NAS

These are also usually hand crafted by

expert data scientists

Optimization method This is responsible

for providing mechanism to search the

best architecture It could be searched

and applied randomly or using certain

statistical or Machine Learning evaluation

approach such as Bayesian method or

reinforcement learning methods

Evaluation method This has the role

of evaluating the quality of architecture

considered by optimization method It

could be done using full training approach

or doing partial training and then applying

certain specialized methods such as partial

training or early stopping weights sharing

network morphism etc

For selective problem spaces as

compared to manual methods NAS have

outperformed and is showing definite

promise for future However it is still

evolving and not ready for production

usages as several architectures need to be

established and evaluated depending on

the problem space

Search Space

DAG Representation

Cell Block

Meta-Architecture

NAS Specic

Reinforcement Learning

Evolutionary Search

Gradient-Based Optimization

BayesianOptimization

Optimization Method

Components of NAS

Full Training

Partial Training

Weight-Sharing

Network Morphism

Hypernetworks

Evaluation Method

Figure 140 Components of NAS Source Liam Li Ameet Talwalkar What is neural architecture search

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

In this paper we looked at some key H3 AI

areas by no means this is an exhaustive

list Amongst all discussed Transfer

Learning Capsule Networks Explainable

AI Generative AI are making interesting

Addressing H3 AI Trends at Infosys

things possible and looks highly promising

We are keenly experimenting with these

building early use cases and integrating

into our product stack Infosys Enterprise

Cognitive platform (iECP) to solve

interesting client problems Here is a look

at how we are employing these H3 trends

in the work we do

Trend Use cases

1

2

3

4

5

6

7

8

9

10

Explainable AI (XAI)

Generative AI Neural Style Transfer (NST)

Fine Grained Classication

Capsule Networks

Meta Learning

Transfer Learning

Single Shot Learning

Deep Reinforcement Learning (RL)

Auto ML

Neural Architecture Search (NAS)

Applicable across where results need to be traced eg Tumor Detection Mortgage Rejection Candidate Selection etc

Art Generation Sketch Generation Image or Video Resolution Improvements Data GenerationAugmentation Music Generation

Vehicle Classication Type of Tumor Detection

Image Re-constructionImage ComparisonMatching

Intelligent Agents Continuous Learning scenarios for document review and corrections

Identifying person not wearing helmet Logobrand detection in the image Speech Model training for various accents vocabularies

Face Recognition Face Verication

Intelligent Agents Robots Driverless cars Trac Light Monitoring Continuous Learning scenarios for document review and corrections

Invoice Attribute Extraction Document Classication Document Clustering

CNN or RNN based use cases such as Image Classication Object Identication Image Segmentation Speaker Classication etc

External Document copy 2019 Infosys Limited

Table 20 AI Use cases Infosys Research

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

1 Explainable AI (XAI)

2 Fine Grained Classification

5 Transfer Learning

6 Single Shot Learning

3 Capsule Networks

4 Meta Learning

7 Deep Reinforcement Learning (RL)

8 Auto ML

bull httpschristophmgithubiointerpretable-ml-book

bull httpssimmachinescomexplainable-ai

bull httpswwwcmuedunewsstoriesarchives2018octoberexplainable-aihtml

bull httpsmediumcomQuantumBlackmaking-ai-human-again-the-importance-of-explainable-ai-xai-95d347ccbb1c

bull httpstowardsdatasciencecomexplainable-artificial-intelligence-part-2-model-interpretation-strategies-75d4afa6b739

bull httpsvisioncornelleduse3wp-contentuploads201502BMVC14pdf

bull httpswwwfastai20180723auto-ml-3

bull httpsarxivorgpdf160305106pdf

bull httpsarxivorgpdf171009829pdf

bull httpskerasioexamplescifar10_cnn_capsule

bull httpswwwyoutubecomwatchv=pPN8d0E3900

bull httpswwwyoutubecomwatchv=rTawFwUvnLE

bull httpsmediumfreecodecamporgunderstanding-capsule-networks-ais-alluring-new-architecture-bdb228173ddc

bull httpsmediumcomjrodthoughtswhats-new-in-deep-learning-research-openai-s-reptile-makes-it-easier-to-learn-how-to-learn-e0f6651a39f0

bull httpproceedingsmlrpressv48santoro16pdf

bull httpstowardsdatasciencecomwhats-new-in-deep-learning-research-understanding-meta-learning-91fef1295660

bull httpsdeepmindcomblogarticledeep-reinforcement-learning

bull httpsmediumfreecodecamporgan-introduction-to-reinforcement-learning-4339519de419

bull httpsmediumcomjonathan_huialphago-zero-a-game-changer-14ef6e45eba5

bull httpsarxivorgpdf181112560pdf

bull httpswwwml4aadorgautomated-algorithm-designalgorithm-configurationsmac

bull httpswwwfastai20180723auto-ml-3

bull httpswwwfastai20180716auto-ml2auto-ml

bull httpscompetitionscodalaborgcompetitions17767

bull httpswwwautomlorgautomlauto-sklearn

bull httpswwwml4aadorgautomated-algorithm-designalgorithm-configurationsmac

bull httpsautomlgithubioHpBandSterbuildhtmloptimizersbohbhtml

Reference

copy 2019 Infosys Limited Bengaluru India All Rights Reserved Infosys believes the information in this document is accurate as of its publication date such information is subject to change without notice Infosys acknowledges the proprietary rights of other companies to the trademarks product names and such other intellectual property rights mentioned in this document Except as expressly permitted neither this documentation nor any part of it may be reproduced stored in a retrieval system or transmitted in any form or by any means electronic mechanical printing photocopying recording or otherwise without the prior permission of Infosys Limited and or any named intellectual property rights holders under this document

For more information contact askusinfosyscom

Infosyscom | NYSE INFY Stay Connected

9 Neural Architecture Search (NAS)

10 Infosys Enterprise Cognitive Platform

bull httpswwworeillycomideaswhat-is-neural-architecture-search

bull httpswwwinfosyscomservicesincubating-emerging-technologiesofferingsPagesenterprise-cognitive-platformaspx

Sudhanshu Hate is inventor and architect of Infosys Enterprise Cognitive Platform (iECP)

a microservices API based Artificial Intelligence platform He has over 21 years of experience

in creating products solutions and working with clients on industry problems His current

areas of interests are Computer Vision Speech and Unstructured Text based AI possibilities

To know more about our work on the H3 trends in AI write to icetsinfosyscom

About the author

Page 4: H3 Trends in AI Algorithms: The Infosys Way...• Sentiment Analysis Figure 1.0: ... from use cases such as language translations, sentence formulation, text summarization, topic extraction

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

Neural Network algorithms are considered

to derive hidden patterns from data that

many other conventional best of breed

Machine Learning algorithms such as

Support Vector Machines Random Forrest

Naiumlve Bayes etc are unable to establish

However there is an increasing rate of

incorrect and unexplainable decisions

and results produced by the Neural

Network algorithms in activities such

as credit lending skilled job hiring and

facial recognition Given this scenario

AI results should be justified explained

and reproduced for consistency and

correctness as some of these results can

Network dissection helps in associating

these established units to concepts

They learn from labeled concepts during

supervised training stages and how and in

what magnitude these are influenced by

channel activations

Several frameworks are currently evolving

to improve the explainability of the

models Two known frameworks in this

space are LIME and SHAP

Explainable AI (XAI)

have profound impact on livelihood

Geoffrey Hinton (University of Toronto)

often called the godfather of deep

learning explains ldquoA deep-learning system

doesnrsquot have any explanatory power The

more powerful the deep-learning system

becomes the more opaque it can becomerdquo

It is to address the issues of transparency

in AI that Explainable AI was developed

Explainable AI (XAI) as a framework

increases the transparency of black-box

algorithms by providing explanations for

the predictions made and can accurately

explain a prediction at the individual level

LIME ( Local Interpretability Model

agnostic Explanations) It treats the

model as a blackbox and tries to create

another surrogate non-linear model where

explainablitiy is supported or feasible

such as SVM Random Forest or Logistic

Regression The surrogate non-linear

model is then used to evaluate different

components of the image by perturbing

the inputs and evaluating its impact

on the result Thereby deciding which

Here are a few approaches provided

through certain frameworks that can help

understand the traceability of results

Feature Visualization as depicted in figure

below helps in visualizing various layers in

a neural network They help establish that

lower layers are useful in learning features

such as edges and textures whereas

higher layer provides more of higher order

abstract concepts such as objects

parts of the image are most important in

arriving at results Since the original model

does not participate directly it is model

independent The challenge with this

approach is that even when the surrogate

model based explanations can be relevant

to the model it is used on it may not be

generalizable precisely or become one to

one mappable to the original model all the

time

External Document copy 2019 Infosys Limited

Edges Textures Patterns Parts Objects

Figure 20 Feature Visualisation Source Olah et al 2017 (CC-BY 40)

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

Generative AI will have a potentially

strong role in creative work be it writing

articles creating completely new images

from the existing set of trained models

improving image or video quality merging

images for artistic creations in creating

music or improving dataset through data

generation Generative AI as it matures in

near term will augment many jobs and will

potentially replace many in future

Generative Networks consists of two deep

neural networks a generative network

and a discriminative network They work

together to provide high-level simulation

of conceptual tasks

To train a Generative model we first

collect a large amount of data in some

Generative AI

Steps

Neural Style Transfer (NST)

1 Create set of noisy (perturbed) example

images by disabling certain features

(marking certain portions gray)

2 For each example get the probability

that tree frog is in the image as per

original model

3 Using these created data points train a

domain (eg think millions of images

sentences or sounds etc) and then

train the model to generate similar

data Generative network generates the

data to fool the Discriminative Network

while Discriminative Network learns by

identifying real vs fake data received from

the Generative Network

Generator trains with an objective function

on whether it can fool the discriminator

network whereas discriminator trains on

its ability to not be fooled and correctly

identify real vs fake Both network

learns through back propagation The

generator is typically a deconvolutional

neural network and the discriminator is a

convolutional neural network

Generative Networks can be of multiple

types depending on the objective they are

designed for example being

Figure 30 Explaining a Prediction with LIME Source Pol Ferrando Understanding how LIME explains predictions

Neural Style Transfer (NST) is one of the

Generative AI techniques in deep learning

As seen below it merges two images

namely a content image (C) and a style

image (S) to create a generated image

(G) The generated image G combines the

content of the image C with the style of

image S

simple linear model (Logistic regression

etc) and get the results

4 Superpixels with highest positive

weights becomes an explanation

SHAP (SHapley Additive exPlanations)

It uses a game theory based approach

to predict the outcome by using various

permutations and combinations of features

and their effect on the delta of the result

(predicted - actual) and then computing

average of the score for that feature to

explain the results For image use cases

it marks the dominating feature areas by

coloring the pixels in the image

SHAP produces relatively accurate results

and is more widely used in Explainable AI

as against LIME

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

Some of the other GAN variations that are

popular are

Super Resolution GAN (SRGAN) that

helps improve quality of images

Stack-GAN that generates realistic

looking photographs from textual

descriptions of simple objects like birds

and flowers

Sketch-GAN a Generative model for

vector drawings which is a Recurrent

Neural Network (RNN) and is able to

construct stroke-based drawings of

common objects The model is trained

on a dataset of human-drawn images

representing many different classes

eGANs (Evolutionary Generative

Adversarial Networks) that generate

photographs of faces with different

ages from young to old

IcGAN to reconstruct photographs

of faces with specific features such

as changes in hair color style facial

expression and even gender

Content image

Content image

Content image

Colorful circle

Louvre museum

Ancient city of Persepolis

Blue painting

Impressionist style painting

Colorful circle with blue painting style

Louvre painting with impressionist style

Persepolis in Van Gogh style

The Starry Night (Van Gogh)

Style image

Style image

Style image

Generated image

Generated image

Generated image

External Document copy 2019 Infosys Limited

Figure 40 Novel Artistic Images through Neural Style Transfer Source Fisseha Berhane Deep Learning amp Art Neural Style Transfer

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

Classification of an object into specific

categories such as car table flower and

such are common in Computer Vision

However establishing the objectsrsquo finer

class based on specific characteristics is

where AI is making rapid progress This is

because granular features of objects are

being trained and used for differentiation

of objects

Examples of Fine Grained Classification are

In Fine Grained Classification the

progression through the 8-layer CNN

network can be thought of as a progression

from low to mid to high-level features

The later layers aggregate more complex

structural information across larger

scalesndashsequences of convolutional layers

Fine Grained Classification Fine grained clothing style finder type

of a shoe etc

Recognizing a car type

Recognizing breed of a dog plant

species insect bird species etc

However fine-grained classification is

challenging due to the difficulty of finding

discriminative features Finding those

subtle traits that fully characterize the

object is not straightforward

Feature representations that better

preserve fine-grained information

Segmentation-based approaches that

facilitate extraction of purer features

and partpose normalized feature

spaces

Pose Normalization Schemes

Fine Grained Classification Approaches

interleaved with max-pooling can capture

deformable parts and fully connected

layers can capture complex co-occurrence

statistics

Bird recognition is one of the major examples in fine grained classification in the below image given a test image

groups of detected key points are used to compute multiple warped image regions that are aligned with prototypical models Each region is fed through a deep convolutional network and features are extracted from multiple layers after which they are concatenated and fed to a classifier

Figure 50 Bird Recognition Pipeline Overview Source Branson Van Hoen et al Bird Species Categorization

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

The below pictures and steps depict fine

grained classification approach for car

detection system

a Detects parts using a collection of

unsupervised part detectors

b Outputs a grid of discriminative

features (The CNN is learned with class

Car Detection System using Fine Grained Classification

labels and then truncated retaining

the first two convolutional layers

that retain spatial information) The

appearance of each part detected

using the learned CNN features is

described by pooling in the detected

region of each part

c Appearance of any undetected part

is set to zero This results in Ensemble

of Localized Learned Features (ELLF)

representation which is then used to

predict fine-grained object categories

d A standard CNN passes the output

of the convolutional layers through

several fully connected layers in order

to make a prediction

Convolutional Network are so far the

defacto and well accepted algorithms to

work with image based datasets They

work on the pixels of images using various

size filters (channels) by convolving using

pooling techniques to bubble the stronger

features to derive colors textures edges

and shapes and establish structures

through lower to highest layers

Given the face of a person CNN identifies

the face by establishing eyes ears

eyebrows lips chin etc components

of the face However if the facial image

is provided with incorrect position and

Capsule Network

alignment of eyes and eyebrows or say

eyebrows swaps with lips and ears are

placed on forehead the same CNN trained

algorithm would still go on and detect this

as a human face This is the huge drawback

of CNN algorithm and happens due to

its inability to store the information on

relative position of various objects

Capsule Network invented by Geoffery

Hinton addresses exactly this problem of

CNN by storing the spatial relationships of

various parts

Capsule Network like CNN are multi

layered neural networks consisting of

several capsules each capsule consists

of several neurons Capsules in lower

layers are called primary capsules and are

trained to detect an object (eg triangle

circle) within a given region of image It

outputs a vector that has two properties

Length and Orientation Length represents

the probability of the presence of the

object and Orientation represents the

pose parameters of the object such as

coordinates rotation angle etc

Capsules in higher layers called routing

capsules detect larger and more complex

objects such as eyes ears etc

Figure 60 Car Detection System Source Learning Features and Parts for Fine-Grained Recognition

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

Routing by Agreement

Advantage over CNN

Unlike CNN which primarily bubbles higher

order features using max or avg pooling

Capsule Network bubbles up features

using routing by agreement where every

capsule participates in choosing the shape

by voting (democratic election way)

In the figure given above

Lower level corresponds to rectangles

triangles and circles

High level corresponds to houses

boats and cars

If there is an image of a house the

capsules corresponding to rectangles

and triangles will have large activation

vectors Their relative positions (coded

in their instantiation parameters) will bet

on the presence of high-level objects

Since they will agree on the presence of

house the output vector of the house

capsule will become large This in turn

will make the predictions by the rectangle

and the triangle capsules larger This

cycle will repeat 4-5 times after which the

bets on the presence of a house will be

considerably larger than the bets on the

presence of a boat or a car

Less data for training - Capsule

Networks need very less data for

training (almost 10) as compared to

CNN

Fewer parameters The connections

between layers require fewer

parameters as capsule groups neurons

resulting in relatively less computations

bandwidth

Preserve pose and position - They

preserve pose and position information

as against CNN

High accuracy - Capsule Networks

have higher accuracy as compared to

CNNs

Reconstruction vs mere classification

- CNN helps you to classify the images

but not reconstruct the same image

whereas Capsule Networks help you to

reconstruct the exact image

Information retention vs loss - With

CNN during edge detection kernel

for edge detection works only on a

specific angle and each angle requires

a corresponding kernel When dealing

with edges CNN works well because

there are very few ways to describe

an edge Once we get up to the level

of shapes we do not want to have a

kernel for every angle of rectangles

ovals triangles and so on It would

get unwieldy and would become

even worse when dealing with more

complicated shapes that have 3

dimensional rotations and features like

lighting the reason why traditional

neural nets do not handle unseen

rotations effectively

Capsule Networks are best suited for

object detection and image segmentation

while it helps better model hierarchical

relationships and provides high accuracy

However Capsule Networks are still under

research and relatively new and mostly

tested and benchmarked on MNIST

dataset but they will be the future in

working with massive use cases emerging

from Vision datasets

Figure 70 A simple CapsNet with 3 layers This model gives comparable results to deep convolutional networks Source Dynamic

Routing Between Capsules Sara Sabour Nicholas Frosst Geoffrey E Hinton

Figure 80 Capsule Network for House or Boat classification Source Beginnersrsquo Guide to Capsule Networks

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

Traditional methods of learning in Machine

Learning focuses on taking a huge labeled

dataset and then learning to detect y

(dependent variable say classifying an

image as cat or dog) and given set of x

(independent variables images of cats

and dogs) This process involves selection

of an algorithm such as Convolution

Neural Net and arriving at various hyper

parameters such as number of layers

in the network number of neurons in

each layer learning rate weights bias

dropouts activation function to activate

the neuron such as sigmoid tanh and

Relu The learning happens through

several iterations of forward and backward

passes (propagation) by readjusting (also

called learning) the weights based on

difference in the loss (actual vs computed)

At the minimal loss the weights and

other network parameters are frozen

and are considered final model for future

prediction tasks This is obviously a long

and tedious process and repeating this for

every use case or task is engineering data

and compute intensive

Meta Learning focuses on how to learn to

learn It is one of the fascinating discipline

of artificial intelligence Human beings

have varying styles of learning Some

Humans can learn from their own existing

experiences or experiences they have

heard seen or observed Transfer Learning

discipline of AI is based on similar traits of

human learning where new models can

learn and benefit from existing trained

model

For example if a Computer Vision based

detection model with no Transfer

Learning that already detects various

types of vehicles such as cars trucks and

bicycles needs to be trained to detect an

airplane then you may have to retrain the

full model with images of all the previous

objects

Like the variety in human learning

techniques Meta Learning also uses

various learning methods based on

patterns of problems such as those based

on boundary space amount of data by

optimizing size of neural network or using

recurrent network approach Each of these

are briefly discussed inline

Few Shots Meta-Learning

This learning technique focuses on

learning from a few instances of data

Typically Neural Nets need millions of

data points to learn however Few Shots

Meta- Learning uses only a few instances

of data to build models Examples being

Facial recognition systems using Single

Shot Learning this is explained in detail in

Single Shot Learning section

Optimizer Meta-Learning

In this method the emphasis is on

optimizing the neural network and its

hyper- parameters A great example of

optimizer meta-learning are models that

are focused on improving gradient descent

techniques

Metric Meta-Learning

In this learning method the metric space

is narrowed down to improve the focus of

learning Then the learning is carried out

only in this metric space by leveraging

various optimization parameters that are

established for the given metric space

Recurrent Model Meta-Learning

This type of meta-learning model is tailored

to Recurrent Neural Networks(RNNs)

such as Long-Short-Term-Memory(LSTM)

In this architecture the meta-learner

algorithm will train a RNN model to process

a dataset sequentially and then process

new inputs from the task In an image

classification setting this might involve

passing in the set of (image label) pairs

of a dataset sequentially followed by new

examples which must be classified Meta-

Reinforcement Learning is an example of

this approach

Meta Learning

Transfer Learning (TL)

Types of Meta-Learning Models

people learn and memorize with one

instance of visual or auditory scan Some

people need multiple perspectives to

strengthen the neural connections for

permanent memory Some remember by

writing while some remember through

actual experiences Meta Learning tries

to leverage these to build its learning

characteristics

However with Transfer Learning you can

introduce an additional layer on top of the

existing pre-trained layer to start detecting

airplanes

Typically in a no Transfer Learning

scenario model needs to be trained and

during training right weights are arrived

at by doing many iterations (epochs) of

forward and back propagation which takes

significant amount of computation power

and time In addition Vision models need

significant amount of image data such as

in this example images of airplanes to be

trained

With Transfer Learning approach you can

reuse the existing pre-trained weights of an

existing trained model with significantly

less number of images ( 5 to 10 percent

of actual images needed for training

ground up model) for the model to start

detecting As the pre-trained model has

already learnt some basic learning around

identifying edges curves and shapes in the

earlier layers it needs to learn only higher

order features specific to airplanes with

the existing computed weights In brief

Transfer Learning helps eliminate the need

to learn anything from scratch

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

Transfer Learning helps in saving

significant amount of data computational

power and time in training new models as

they leverage pre-trained weights from the

existing trained models and architectures

However it is important to understand

that Transfer Learning approach today

is only matured enough to be applied to

similar use cases that is you cannot use

the above discussed model to train a facial

recognition model

Another key thing during Transfer Learning

is that it is important to understand the

details of the data on which new use cases

are being trained as it can implicitly push

the built-in biases from the underlying data

into newer systems It is recommended

that the datasheets of underlying models

and data be studied thoroughly unless the

usage is for experimentative purpose

Earlier having used the human brain

rationale it is important to note that

human brains have gone through centuries

of experiences and gene evolution and has

the ability to learn faster whereas transfer

learning is just a few decades old and is

becoming ground for new vision and text

use cases

External Document copy 2019 Infosys Limited

Figure 90 Transfer Learning Layers Source John Cherrie Training Deep Learning Models with Transfer Learning

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

Humans have the impressive skill to reason

about new concepts and experiences with

just a single example They have the ability

for one-shot generalization the aptitude

to encounter a new concept understand

its structure and then generate compelling

alternative variations of the same

Facial recognition systems are good

candidates for Single Shot Learning

otherwise needing ten thousands of

individual face images to train one neural

network can be extremely costly time

consuming and infeasible However a

Single Shot LearningSingle Shot Learning based system using

existing pre-trained FaceNet model and

facial encoding based approach on top of

it can be very effective to establish face

similarity by computing distance between

the faces

In this approach 128 bit encoding of each

face image is generated and compared

with other imagersquos encoding to determine

if the person is same or different

Various distance based algorithms such

as Euclidean distance can be used to

determine if they are within specified

threshold The model training approach

involves creating pairs of (Anchor Positive)

and (Anchor Negative) and training the

model in a way where (Anchor Positive)

pair distance difference is smaller and

(Anchor Negative) distance is farther

ldquoAnchorrdquo is the image of a person for whom

the recognition model needs to be trained

ldquoPositiverdquo is another image of the same

person

ldquoNegativerdquo is image of a different person

External Document copy 2019 Infosys Limited

Figure 100 Encoding approach inspired from ML Course from Coursera

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

This is a specialized Machine Learning

discipline where an agent learns to behave

in an environment by getting reward or

punishment for the actions performed The

agent can have an objective to maximize

short term or long-term rewards This

discipline uses deep learning techniques to

Value Based

In value-based RL the goal is to optimize

the value function V(s) Qtable uses any

The agent will use this value function to

select which state to choose at each step

Policy Based

In policy-based RL we want to directly

optimize the policy function π(s) without

using a value function

The policy is what defines the agent

behavior at a given time

Deep Reinforcement Learning (RL)

Three Approaches to Reinforcement Learning

bring in human level performance on the

given task

Deep Reinforcement Learning has found

significant relevance and application in

various game design systems such as

creating video games chess alpha Go

Atari as well as in industrial applications of

mathematical function to arrive at a state

based on action

The value of each state is the total amount

There are two types of policies

1 Deterministic A policy which at a given

state will always return the same action

2 Stochastic A policy that outputs a

distribution probability over actions

Value based and Policy based are more

conventional Reinforcement Learning

approaches They are useful for modeling

relatively simple systems

robots driverless car etc

In reinforcement learning policy p

controls what action we should take Value

function v measures how good it is to be

in a particular state The value function

tells us the maximum expected future

reward the agent will get at each state

of the reward an agent can expect to

accumulate over the future starting at that

state

State

State

Q value

Q value action 2

Q value action 1

Q value action 3

Qtable

Deep Q Neuralnetwork

Q learning

Deep Q learning

Action

ExpectedReward discounted

Given that state

action = policy(state)

Figure 110 Schema inspired by the Q learning notebook by Udacity

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

Model Based

In model-based RL we model the

environment This means we create a

model of the behavior of the environment

then this model is used to arrive at results

that maximises short term or long-term

rewards The model equation can be

any equation that is defined based on

the environments behavior and must be

sufficiently generalized to counter new

situations

When Model based approach uses Deep

Neural Network algorithms to sufficiently

well generalize and learn the complexities

of the environment to produce optimal

results it is called Deep Reinforcement

Learning The challenge with model based

approach is each environment needs a

dedicated trained model

AlphaGo was trained by using data from

several games to beat the human being

in the game of Go The training accuracy

was just 57 and still it was sufficient to

beat the human level performance The

training methods involved reinforcement

learning and deep learning to build a

policy network that tells what moves are

promising and a value network that tells

how good the board position is Searches

for the final move from these networks

is done using Monte Carlo Tree Search

(MCTS) algorithm Using supervised

learning a policy network was created to

imitate the expert moves

Deep Mind released AlphaGo Zero in late

2017 which beat AlphaGo and did not

involve any training from previous games

data to train deep network The deep

network training was done by picking

the training samples from AlphaGo and

AlphaGo Zero playing games against

itself and selecting best moves to train

the network and then applying those

in real games to improve the results

iteratively This is possible because deep

reinforcement learning algorithms can

store long-range tree search results for the

next best move in memory and do very

large computations that are difficult for a

human brain

Designing machine learning solution

involves several steps such as collecting

data understanding cleansing and

normalizing data doing feature

engineering selecting or designing

the algorithm selecting the model

architecture selecting and tuning modelrsquos

hyper-parameters evaluating modelrsquos

performance deploying and monitoring

the machine learning system in an online

system and so on Such machine learning

solution design requires an expert Data

Scientist to complete the pipeline

Auto ML (AML)As the complexity of these and other tasks

can easily get overwhelming the rapid

growth of machine learning applications

has created a demand for off-the-shelf

machine learning methods that can be

used easily and without expert knowledge

The AI research area that encompasses

progressive automation of machine

learning pipeline tasks is called AutoML

(Automatic Machine Learning)

Google CEO Sundar Pichai wrote

ldquoDesigning neural nets is extremely time

intensive and requires an expertise that

limits its use to a smaller community of

scientists and engineers Thatrsquos why wersquove

created an approach called AutoML

showing that itrsquos possible for neural nets

to design neural netsrdquo while Googlersquos

Head of AI Jeff Dean suggested that 100x

computational power could replace the

need for machine learning expertise

AutoML Vision relies on two core

techniques transfer learning and neural

architecture search

Xtrain Ytrain

Xtest budget

Han

d-cr

afte

d po

rtfo

lio Meta Learning

AutoML system

Build Ensemble Ytest Data

ProcessorFeature

Preprocessor

Bayesian Optimization

Classier

ML Pipeline

Figure 120 An example of Auto sklearn pipeline Source Andreacute Biedenkapp We did it Again World Champions in AutoML

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

Here is a look at the few libraries that help

in implementing AutoML

AUTO-SKLEARN

AUTO SKLEARN automates several key

tasks in Machine Learning pipeline such

as addressing column missing values

encoding of categorical values data scaling

and normalization feature pre-processing

and selection of right algorithm with

hyper-parameters The pipeline supports

15 Classification and 14 Feature processing

algorithms Selection of right algorithm can

happen based on ensembling techniques

and applying meta knowledge gathered

from executing similar scenarios (datasets

and algorithms)

Usage

Auto-sklearn is written in python and can

be considered as replacement for scikit-

learn classifiers Here is a sample set of

commands

gtgtgt import autosklearnclassification

Implementing AutoML gtgtgt cls = autosklearnclassification

AutoSklearnClassifier()

gtgtgt clsfit(X_train y_train)

gtgtgt predictions = clspredict(X_test

y_test)

SMAC (Sequential Model-Based

Algorithm Configuration)

SMAC is a tool for automating certain

AutoML steps SMAC is useful for selection

of key features hyper-parameter

optimization and to speed up algorithmic

outputs

BOHB (Bayesian Optimization

Hyperband searches)

BOHB combines Bayesian hyper parameter

optimization with bandit methods for

faster convergence

Google H2O also have their respective

AutoML tools which are not covered here

but can be explored in specific cases

AutoML needs significant memory and

computational power to execute alternate

algorithms and compute results At

present GPU resources are extremely

costly to execute even simple Machine

Learning workloads such as CNN algorithm

to classify objects If multiple such

alternate algorithms should be executed

the computation dollar needed would be

exponential This is impractical infeasible

and inefficient for the current state of Data

Science industry Adoption of AutoML will

depend on two things one the maturity

of AutoML pipeline and second but more

important how quickly GPU clusters

become cheap The second being most

critical Selling Cloud GPU capacity could

be one of the motivation of several cloud

based infrastructure-running companies

to promote AutoML in the industry Also

AutoML will not replace the Data scientistrsquos

work but can provide augmentation

and speed to certain tasks such as data

standardization model tuning and

trying multiple algorithms It is only the

beginning for AutoML but this technique

has high relevance and usefulness for

solving ultra-complex problems

External Document copy 2019 Infosys Limited

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

Neural Architecture Search (NAS) is a

component of AutoML and addresses

the important step of designing Neural

Network Architecture

Designing fresh Neural Net architecture

involves an expert establishing and

organizing Neural Network layers filters

or channels filter sizes selecting other

optimum Hyper parameters and so on

through several rounds of computational

Neural Architecture Search (NAS)

iterations Since AlexNet deep neural

network architecture won the ImageNet

(image classification based on ImageNet

dataset) competition in 2012 several

architecture styles such as VGG ResNet

Inception Xception InceptionResNet

MobileNet and NASNet have significantly

evolved However selection of the right

architecture for the right problem is also

a skill due to the presence of various

influencers such as applicability to the

problem accuracy number of parameters

memory and computational footprint and

size of the architecture that govern the

overall functioning efficiency

Neural Architecture Search tries to address

this problem space by automatically

selecting right Neural Network architecture

to solve a given problem

AutoML

HyperparameterOptimization

NAS

External Document copy 2019 Infosys Limited

Figure 130 Source Liam Li Ameet Talwalkar What is neural architecture search

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

Search space The search space provides

boundary within which the specific

architecture needs to be searched

Computer Vision (captioning the scene

or product identification) based use

cases would need a different neural

network architecture style as against

Speech (speech transcription or speaker

classification) or unstructured Text (Topic

extraction intent mining) based use cases

Search space tries to provide available

catalogs of best in class architectures based

on other domain data and performance

Key Components of NAS

These are also usually hand crafted by

expert data scientists

Optimization method This is responsible

for providing mechanism to search the

best architecture It could be searched

and applied randomly or using certain

statistical or Machine Learning evaluation

approach such as Bayesian method or

reinforcement learning methods

Evaluation method This has the role

of evaluating the quality of architecture

considered by optimization method It

could be done using full training approach

or doing partial training and then applying

certain specialized methods such as partial

training or early stopping weights sharing

network morphism etc

For selective problem spaces as

compared to manual methods NAS have

outperformed and is showing definite

promise for future However it is still

evolving and not ready for production

usages as several architectures need to be

established and evaluated depending on

the problem space

Search Space

DAG Representation

Cell Block

Meta-Architecture

NAS Specic

Reinforcement Learning

Evolutionary Search

Gradient-Based Optimization

BayesianOptimization

Optimization Method

Components of NAS

Full Training

Partial Training

Weight-Sharing

Network Morphism

Hypernetworks

Evaluation Method

Figure 140 Components of NAS Source Liam Li Ameet Talwalkar What is neural architecture search

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

In this paper we looked at some key H3 AI

areas by no means this is an exhaustive

list Amongst all discussed Transfer

Learning Capsule Networks Explainable

AI Generative AI are making interesting

Addressing H3 AI Trends at Infosys

things possible and looks highly promising

We are keenly experimenting with these

building early use cases and integrating

into our product stack Infosys Enterprise

Cognitive platform (iECP) to solve

interesting client problems Here is a look

at how we are employing these H3 trends

in the work we do

Trend Use cases

1

2

3

4

5

6

7

8

9

10

Explainable AI (XAI)

Generative AI Neural Style Transfer (NST)

Fine Grained Classication

Capsule Networks

Meta Learning

Transfer Learning

Single Shot Learning

Deep Reinforcement Learning (RL)

Auto ML

Neural Architecture Search (NAS)

Applicable across where results need to be traced eg Tumor Detection Mortgage Rejection Candidate Selection etc

Art Generation Sketch Generation Image or Video Resolution Improvements Data GenerationAugmentation Music Generation

Vehicle Classication Type of Tumor Detection

Image Re-constructionImage ComparisonMatching

Intelligent Agents Continuous Learning scenarios for document review and corrections

Identifying person not wearing helmet Logobrand detection in the image Speech Model training for various accents vocabularies

Face Recognition Face Verication

Intelligent Agents Robots Driverless cars Trac Light Monitoring Continuous Learning scenarios for document review and corrections

Invoice Attribute Extraction Document Classication Document Clustering

CNN or RNN based use cases such as Image Classication Object Identication Image Segmentation Speaker Classication etc

External Document copy 2019 Infosys Limited

Table 20 AI Use cases Infosys Research

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

1 Explainable AI (XAI)

2 Fine Grained Classification

5 Transfer Learning

6 Single Shot Learning

3 Capsule Networks

4 Meta Learning

7 Deep Reinforcement Learning (RL)

8 Auto ML

bull httpschristophmgithubiointerpretable-ml-book

bull httpssimmachinescomexplainable-ai

bull httpswwwcmuedunewsstoriesarchives2018octoberexplainable-aihtml

bull httpsmediumcomQuantumBlackmaking-ai-human-again-the-importance-of-explainable-ai-xai-95d347ccbb1c

bull httpstowardsdatasciencecomexplainable-artificial-intelligence-part-2-model-interpretation-strategies-75d4afa6b739

bull httpsvisioncornelleduse3wp-contentuploads201502BMVC14pdf

bull httpswwwfastai20180723auto-ml-3

bull httpsarxivorgpdf160305106pdf

bull httpsarxivorgpdf171009829pdf

bull httpskerasioexamplescifar10_cnn_capsule

bull httpswwwyoutubecomwatchv=pPN8d0E3900

bull httpswwwyoutubecomwatchv=rTawFwUvnLE

bull httpsmediumfreecodecamporgunderstanding-capsule-networks-ais-alluring-new-architecture-bdb228173ddc

bull httpsmediumcomjrodthoughtswhats-new-in-deep-learning-research-openai-s-reptile-makes-it-easier-to-learn-how-to-learn-e0f6651a39f0

bull httpproceedingsmlrpressv48santoro16pdf

bull httpstowardsdatasciencecomwhats-new-in-deep-learning-research-understanding-meta-learning-91fef1295660

bull httpsdeepmindcomblogarticledeep-reinforcement-learning

bull httpsmediumfreecodecamporgan-introduction-to-reinforcement-learning-4339519de419

bull httpsmediumcomjonathan_huialphago-zero-a-game-changer-14ef6e45eba5

bull httpsarxivorgpdf181112560pdf

bull httpswwwml4aadorgautomated-algorithm-designalgorithm-configurationsmac

bull httpswwwfastai20180723auto-ml-3

bull httpswwwfastai20180716auto-ml2auto-ml

bull httpscompetitionscodalaborgcompetitions17767

bull httpswwwautomlorgautomlauto-sklearn

bull httpswwwml4aadorgautomated-algorithm-designalgorithm-configurationsmac

bull httpsautomlgithubioHpBandSterbuildhtmloptimizersbohbhtml

Reference

copy 2019 Infosys Limited Bengaluru India All Rights Reserved Infosys believes the information in this document is accurate as of its publication date such information is subject to change without notice Infosys acknowledges the proprietary rights of other companies to the trademarks product names and such other intellectual property rights mentioned in this document Except as expressly permitted neither this documentation nor any part of it may be reproduced stored in a retrieval system or transmitted in any form or by any means electronic mechanical printing photocopying recording or otherwise without the prior permission of Infosys Limited and or any named intellectual property rights holders under this document

For more information contact askusinfosyscom

Infosyscom | NYSE INFY Stay Connected

9 Neural Architecture Search (NAS)

10 Infosys Enterprise Cognitive Platform

bull httpswwworeillycomideaswhat-is-neural-architecture-search

bull httpswwwinfosyscomservicesincubating-emerging-technologiesofferingsPagesenterprise-cognitive-platformaspx

Sudhanshu Hate is inventor and architect of Infosys Enterprise Cognitive Platform (iECP)

a microservices API based Artificial Intelligence platform He has over 21 years of experience

in creating products solutions and working with clients on industry problems His current

areas of interests are Computer Vision Speech and Unstructured Text based AI possibilities

To know more about our work on the H3 trends in AI write to icetsinfosyscom

About the author

Page 5: H3 Trends in AI Algorithms: The Infosys Way...• Sentiment Analysis Figure 1.0: ... from use cases such as language translations, sentence formulation, text summarization, topic extraction

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

Generative AI will have a potentially

strong role in creative work be it writing

articles creating completely new images

from the existing set of trained models

improving image or video quality merging

images for artistic creations in creating

music or improving dataset through data

generation Generative AI as it matures in

near term will augment many jobs and will

potentially replace many in future

Generative Networks consists of two deep

neural networks a generative network

and a discriminative network They work

together to provide high-level simulation

of conceptual tasks

To train a Generative model we first

collect a large amount of data in some

Generative AI

Steps

Neural Style Transfer (NST)

1 Create set of noisy (perturbed) example

images by disabling certain features

(marking certain portions gray)

2 For each example get the probability

that tree frog is in the image as per

original model

3 Using these created data points train a

domain (eg think millions of images

sentences or sounds etc) and then

train the model to generate similar

data Generative network generates the

data to fool the Discriminative Network

while Discriminative Network learns by

identifying real vs fake data received from

the Generative Network

Generator trains with an objective function

on whether it can fool the discriminator

network whereas discriminator trains on

its ability to not be fooled and correctly

identify real vs fake Both network

learns through back propagation The

generator is typically a deconvolutional

neural network and the discriminator is a

convolutional neural network

Generative Networks can be of multiple

types depending on the objective they are

designed for example being

Figure 30 Explaining a Prediction with LIME Source Pol Ferrando Understanding how LIME explains predictions

Neural Style Transfer (NST) is one of the

Generative AI techniques in deep learning

As seen below it merges two images

namely a content image (C) and a style

image (S) to create a generated image

(G) The generated image G combines the

content of the image C with the style of

image S

simple linear model (Logistic regression

etc) and get the results

4 Superpixels with highest positive

weights becomes an explanation

SHAP (SHapley Additive exPlanations)

It uses a game theory based approach

to predict the outcome by using various

permutations and combinations of features

and their effect on the delta of the result

(predicted - actual) and then computing

average of the score for that feature to

explain the results For image use cases

it marks the dominating feature areas by

coloring the pixels in the image

SHAP produces relatively accurate results

and is more widely used in Explainable AI

as against LIME

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

Some of the other GAN variations that are

popular are

Super Resolution GAN (SRGAN) that

helps improve quality of images

Stack-GAN that generates realistic

looking photographs from textual

descriptions of simple objects like birds

and flowers

Sketch-GAN a Generative model for

vector drawings which is a Recurrent

Neural Network (RNN) and is able to

construct stroke-based drawings of

common objects The model is trained

on a dataset of human-drawn images

representing many different classes

eGANs (Evolutionary Generative

Adversarial Networks) that generate

photographs of faces with different

ages from young to old

IcGAN to reconstruct photographs

of faces with specific features such

as changes in hair color style facial

expression and even gender

Content image

Content image

Content image

Colorful circle

Louvre museum

Ancient city of Persepolis

Blue painting

Impressionist style painting

Colorful circle with blue painting style

Louvre painting with impressionist style

Persepolis in Van Gogh style

The Starry Night (Van Gogh)

Style image

Style image

Style image

Generated image

Generated image

Generated image

External Document copy 2019 Infosys Limited

Figure 40 Novel Artistic Images through Neural Style Transfer Source Fisseha Berhane Deep Learning amp Art Neural Style Transfer

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

Classification of an object into specific

categories such as car table flower and

such are common in Computer Vision

However establishing the objectsrsquo finer

class based on specific characteristics is

where AI is making rapid progress This is

because granular features of objects are

being trained and used for differentiation

of objects

Examples of Fine Grained Classification are

In Fine Grained Classification the

progression through the 8-layer CNN

network can be thought of as a progression

from low to mid to high-level features

The later layers aggregate more complex

structural information across larger

scalesndashsequences of convolutional layers

Fine Grained Classification Fine grained clothing style finder type

of a shoe etc

Recognizing a car type

Recognizing breed of a dog plant

species insect bird species etc

However fine-grained classification is

challenging due to the difficulty of finding

discriminative features Finding those

subtle traits that fully characterize the

object is not straightforward

Feature representations that better

preserve fine-grained information

Segmentation-based approaches that

facilitate extraction of purer features

and partpose normalized feature

spaces

Pose Normalization Schemes

Fine Grained Classification Approaches

interleaved with max-pooling can capture

deformable parts and fully connected

layers can capture complex co-occurrence

statistics

Bird recognition is one of the major examples in fine grained classification in the below image given a test image

groups of detected key points are used to compute multiple warped image regions that are aligned with prototypical models Each region is fed through a deep convolutional network and features are extracted from multiple layers after which they are concatenated and fed to a classifier

Figure 50 Bird Recognition Pipeline Overview Source Branson Van Hoen et al Bird Species Categorization

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

The below pictures and steps depict fine

grained classification approach for car

detection system

a Detects parts using a collection of

unsupervised part detectors

b Outputs a grid of discriminative

features (The CNN is learned with class

Car Detection System using Fine Grained Classification

labels and then truncated retaining

the first two convolutional layers

that retain spatial information) The

appearance of each part detected

using the learned CNN features is

described by pooling in the detected

region of each part

c Appearance of any undetected part

is set to zero This results in Ensemble

of Localized Learned Features (ELLF)

representation which is then used to

predict fine-grained object categories

d A standard CNN passes the output

of the convolutional layers through

several fully connected layers in order

to make a prediction

Convolutional Network are so far the

defacto and well accepted algorithms to

work with image based datasets They

work on the pixels of images using various

size filters (channels) by convolving using

pooling techniques to bubble the stronger

features to derive colors textures edges

and shapes and establish structures

through lower to highest layers

Given the face of a person CNN identifies

the face by establishing eyes ears

eyebrows lips chin etc components

of the face However if the facial image

is provided with incorrect position and

Capsule Network

alignment of eyes and eyebrows or say

eyebrows swaps with lips and ears are

placed on forehead the same CNN trained

algorithm would still go on and detect this

as a human face This is the huge drawback

of CNN algorithm and happens due to

its inability to store the information on

relative position of various objects

Capsule Network invented by Geoffery

Hinton addresses exactly this problem of

CNN by storing the spatial relationships of

various parts

Capsule Network like CNN are multi

layered neural networks consisting of

several capsules each capsule consists

of several neurons Capsules in lower

layers are called primary capsules and are

trained to detect an object (eg triangle

circle) within a given region of image It

outputs a vector that has two properties

Length and Orientation Length represents

the probability of the presence of the

object and Orientation represents the

pose parameters of the object such as

coordinates rotation angle etc

Capsules in higher layers called routing

capsules detect larger and more complex

objects such as eyes ears etc

Figure 60 Car Detection System Source Learning Features and Parts for Fine-Grained Recognition

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

Routing by Agreement

Advantage over CNN

Unlike CNN which primarily bubbles higher

order features using max or avg pooling

Capsule Network bubbles up features

using routing by agreement where every

capsule participates in choosing the shape

by voting (democratic election way)

In the figure given above

Lower level corresponds to rectangles

triangles and circles

High level corresponds to houses

boats and cars

If there is an image of a house the

capsules corresponding to rectangles

and triangles will have large activation

vectors Their relative positions (coded

in their instantiation parameters) will bet

on the presence of high-level objects

Since they will agree on the presence of

house the output vector of the house

capsule will become large This in turn

will make the predictions by the rectangle

and the triangle capsules larger This

cycle will repeat 4-5 times after which the

bets on the presence of a house will be

considerably larger than the bets on the

presence of a boat or a car

Less data for training - Capsule

Networks need very less data for

training (almost 10) as compared to

CNN

Fewer parameters The connections

between layers require fewer

parameters as capsule groups neurons

resulting in relatively less computations

bandwidth

Preserve pose and position - They

preserve pose and position information

as against CNN

High accuracy - Capsule Networks

have higher accuracy as compared to

CNNs

Reconstruction vs mere classification

- CNN helps you to classify the images

but not reconstruct the same image

whereas Capsule Networks help you to

reconstruct the exact image

Information retention vs loss - With

CNN during edge detection kernel

for edge detection works only on a

specific angle and each angle requires

a corresponding kernel When dealing

with edges CNN works well because

there are very few ways to describe

an edge Once we get up to the level

of shapes we do not want to have a

kernel for every angle of rectangles

ovals triangles and so on It would

get unwieldy and would become

even worse when dealing with more

complicated shapes that have 3

dimensional rotations and features like

lighting the reason why traditional

neural nets do not handle unseen

rotations effectively

Capsule Networks are best suited for

object detection and image segmentation

while it helps better model hierarchical

relationships and provides high accuracy

However Capsule Networks are still under

research and relatively new and mostly

tested and benchmarked on MNIST

dataset but they will be the future in

working with massive use cases emerging

from Vision datasets

Figure 70 A simple CapsNet with 3 layers This model gives comparable results to deep convolutional networks Source Dynamic

Routing Between Capsules Sara Sabour Nicholas Frosst Geoffrey E Hinton

Figure 80 Capsule Network for House or Boat classification Source Beginnersrsquo Guide to Capsule Networks

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

Traditional methods of learning in Machine

Learning focuses on taking a huge labeled

dataset and then learning to detect y

(dependent variable say classifying an

image as cat or dog) and given set of x

(independent variables images of cats

and dogs) This process involves selection

of an algorithm such as Convolution

Neural Net and arriving at various hyper

parameters such as number of layers

in the network number of neurons in

each layer learning rate weights bias

dropouts activation function to activate

the neuron such as sigmoid tanh and

Relu The learning happens through

several iterations of forward and backward

passes (propagation) by readjusting (also

called learning) the weights based on

difference in the loss (actual vs computed)

At the minimal loss the weights and

other network parameters are frozen

and are considered final model for future

prediction tasks This is obviously a long

and tedious process and repeating this for

every use case or task is engineering data

and compute intensive

Meta Learning focuses on how to learn to

learn It is one of the fascinating discipline

of artificial intelligence Human beings

have varying styles of learning Some

Humans can learn from their own existing

experiences or experiences they have

heard seen or observed Transfer Learning

discipline of AI is based on similar traits of

human learning where new models can

learn and benefit from existing trained

model

For example if a Computer Vision based

detection model with no Transfer

Learning that already detects various

types of vehicles such as cars trucks and

bicycles needs to be trained to detect an

airplane then you may have to retrain the

full model with images of all the previous

objects

Like the variety in human learning

techniques Meta Learning also uses

various learning methods based on

patterns of problems such as those based

on boundary space amount of data by

optimizing size of neural network or using

recurrent network approach Each of these

are briefly discussed inline

Few Shots Meta-Learning

This learning technique focuses on

learning from a few instances of data

Typically Neural Nets need millions of

data points to learn however Few Shots

Meta- Learning uses only a few instances

of data to build models Examples being

Facial recognition systems using Single

Shot Learning this is explained in detail in

Single Shot Learning section

Optimizer Meta-Learning

In this method the emphasis is on

optimizing the neural network and its

hyper- parameters A great example of

optimizer meta-learning are models that

are focused on improving gradient descent

techniques

Metric Meta-Learning

In this learning method the metric space

is narrowed down to improve the focus of

learning Then the learning is carried out

only in this metric space by leveraging

various optimization parameters that are

established for the given metric space

Recurrent Model Meta-Learning

This type of meta-learning model is tailored

to Recurrent Neural Networks(RNNs)

such as Long-Short-Term-Memory(LSTM)

In this architecture the meta-learner

algorithm will train a RNN model to process

a dataset sequentially and then process

new inputs from the task In an image

classification setting this might involve

passing in the set of (image label) pairs

of a dataset sequentially followed by new

examples which must be classified Meta-

Reinforcement Learning is an example of

this approach

Meta Learning

Transfer Learning (TL)

Types of Meta-Learning Models

people learn and memorize with one

instance of visual or auditory scan Some

people need multiple perspectives to

strengthen the neural connections for

permanent memory Some remember by

writing while some remember through

actual experiences Meta Learning tries

to leverage these to build its learning

characteristics

However with Transfer Learning you can

introduce an additional layer on top of the

existing pre-trained layer to start detecting

airplanes

Typically in a no Transfer Learning

scenario model needs to be trained and

during training right weights are arrived

at by doing many iterations (epochs) of

forward and back propagation which takes

significant amount of computation power

and time In addition Vision models need

significant amount of image data such as

in this example images of airplanes to be

trained

With Transfer Learning approach you can

reuse the existing pre-trained weights of an

existing trained model with significantly

less number of images ( 5 to 10 percent

of actual images needed for training

ground up model) for the model to start

detecting As the pre-trained model has

already learnt some basic learning around

identifying edges curves and shapes in the

earlier layers it needs to learn only higher

order features specific to airplanes with

the existing computed weights In brief

Transfer Learning helps eliminate the need

to learn anything from scratch

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

Transfer Learning helps in saving

significant amount of data computational

power and time in training new models as

they leverage pre-trained weights from the

existing trained models and architectures

However it is important to understand

that Transfer Learning approach today

is only matured enough to be applied to

similar use cases that is you cannot use

the above discussed model to train a facial

recognition model

Another key thing during Transfer Learning

is that it is important to understand the

details of the data on which new use cases

are being trained as it can implicitly push

the built-in biases from the underlying data

into newer systems It is recommended

that the datasheets of underlying models

and data be studied thoroughly unless the

usage is for experimentative purpose

Earlier having used the human brain

rationale it is important to note that

human brains have gone through centuries

of experiences and gene evolution and has

the ability to learn faster whereas transfer

learning is just a few decades old and is

becoming ground for new vision and text

use cases

External Document copy 2019 Infosys Limited

Figure 90 Transfer Learning Layers Source John Cherrie Training Deep Learning Models with Transfer Learning

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

Humans have the impressive skill to reason

about new concepts and experiences with

just a single example They have the ability

for one-shot generalization the aptitude

to encounter a new concept understand

its structure and then generate compelling

alternative variations of the same

Facial recognition systems are good

candidates for Single Shot Learning

otherwise needing ten thousands of

individual face images to train one neural

network can be extremely costly time

consuming and infeasible However a

Single Shot LearningSingle Shot Learning based system using

existing pre-trained FaceNet model and

facial encoding based approach on top of

it can be very effective to establish face

similarity by computing distance between

the faces

In this approach 128 bit encoding of each

face image is generated and compared

with other imagersquos encoding to determine

if the person is same or different

Various distance based algorithms such

as Euclidean distance can be used to

determine if they are within specified

threshold The model training approach

involves creating pairs of (Anchor Positive)

and (Anchor Negative) and training the

model in a way where (Anchor Positive)

pair distance difference is smaller and

(Anchor Negative) distance is farther

ldquoAnchorrdquo is the image of a person for whom

the recognition model needs to be trained

ldquoPositiverdquo is another image of the same

person

ldquoNegativerdquo is image of a different person

External Document copy 2019 Infosys Limited

Figure 100 Encoding approach inspired from ML Course from Coursera

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

This is a specialized Machine Learning

discipline where an agent learns to behave

in an environment by getting reward or

punishment for the actions performed The

agent can have an objective to maximize

short term or long-term rewards This

discipline uses deep learning techniques to

Value Based

In value-based RL the goal is to optimize

the value function V(s) Qtable uses any

The agent will use this value function to

select which state to choose at each step

Policy Based

In policy-based RL we want to directly

optimize the policy function π(s) without

using a value function

The policy is what defines the agent

behavior at a given time

Deep Reinforcement Learning (RL)

Three Approaches to Reinforcement Learning

bring in human level performance on the

given task

Deep Reinforcement Learning has found

significant relevance and application in

various game design systems such as

creating video games chess alpha Go

Atari as well as in industrial applications of

mathematical function to arrive at a state

based on action

The value of each state is the total amount

There are two types of policies

1 Deterministic A policy which at a given

state will always return the same action

2 Stochastic A policy that outputs a

distribution probability over actions

Value based and Policy based are more

conventional Reinforcement Learning

approaches They are useful for modeling

relatively simple systems

robots driverless car etc

In reinforcement learning policy p

controls what action we should take Value

function v measures how good it is to be

in a particular state The value function

tells us the maximum expected future

reward the agent will get at each state

of the reward an agent can expect to

accumulate over the future starting at that

state

State

State

Q value

Q value action 2

Q value action 1

Q value action 3

Qtable

Deep Q Neuralnetwork

Q learning

Deep Q learning

Action

ExpectedReward discounted

Given that state

action = policy(state)

Figure 110 Schema inspired by the Q learning notebook by Udacity

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

Model Based

In model-based RL we model the

environment This means we create a

model of the behavior of the environment

then this model is used to arrive at results

that maximises short term or long-term

rewards The model equation can be

any equation that is defined based on

the environments behavior and must be

sufficiently generalized to counter new

situations

When Model based approach uses Deep

Neural Network algorithms to sufficiently

well generalize and learn the complexities

of the environment to produce optimal

results it is called Deep Reinforcement

Learning The challenge with model based

approach is each environment needs a

dedicated trained model

AlphaGo was trained by using data from

several games to beat the human being

in the game of Go The training accuracy

was just 57 and still it was sufficient to

beat the human level performance The

training methods involved reinforcement

learning and deep learning to build a

policy network that tells what moves are

promising and a value network that tells

how good the board position is Searches

for the final move from these networks

is done using Monte Carlo Tree Search

(MCTS) algorithm Using supervised

learning a policy network was created to

imitate the expert moves

Deep Mind released AlphaGo Zero in late

2017 which beat AlphaGo and did not

involve any training from previous games

data to train deep network The deep

network training was done by picking

the training samples from AlphaGo and

AlphaGo Zero playing games against

itself and selecting best moves to train

the network and then applying those

in real games to improve the results

iteratively This is possible because deep

reinforcement learning algorithms can

store long-range tree search results for the

next best move in memory and do very

large computations that are difficult for a

human brain

Designing machine learning solution

involves several steps such as collecting

data understanding cleansing and

normalizing data doing feature

engineering selecting or designing

the algorithm selecting the model

architecture selecting and tuning modelrsquos

hyper-parameters evaluating modelrsquos

performance deploying and monitoring

the machine learning system in an online

system and so on Such machine learning

solution design requires an expert Data

Scientist to complete the pipeline

Auto ML (AML)As the complexity of these and other tasks

can easily get overwhelming the rapid

growth of machine learning applications

has created a demand for off-the-shelf

machine learning methods that can be

used easily and without expert knowledge

The AI research area that encompasses

progressive automation of machine

learning pipeline tasks is called AutoML

(Automatic Machine Learning)

Google CEO Sundar Pichai wrote

ldquoDesigning neural nets is extremely time

intensive and requires an expertise that

limits its use to a smaller community of

scientists and engineers Thatrsquos why wersquove

created an approach called AutoML

showing that itrsquos possible for neural nets

to design neural netsrdquo while Googlersquos

Head of AI Jeff Dean suggested that 100x

computational power could replace the

need for machine learning expertise

AutoML Vision relies on two core

techniques transfer learning and neural

architecture search

Xtrain Ytrain

Xtest budget

Han

d-cr

afte

d po

rtfo

lio Meta Learning

AutoML system

Build Ensemble Ytest Data

ProcessorFeature

Preprocessor

Bayesian Optimization

Classier

ML Pipeline

Figure 120 An example of Auto sklearn pipeline Source Andreacute Biedenkapp We did it Again World Champions in AutoML

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

Here is a look at the few libraries that help

in implementing AutoML

AUTO-SKLEARN

AUTO SKLEARN automates several key

tasks in Machine Learning pipeline such

as addressing column missing values

encoding of categorical values data scaling

and normalization feature pre-processing

and selection of right algorithm with

hyper-parameters The pipeline supports

15 Classification and 14 Feature processing

algorithms Selection of right algorithm can

happen based on ensembling techniques

and applying meta knowledge gathered

from executing similar scenarios (datasets

and algorithms)

Usage

Auto-sklearn is written in python and can

be considered as replacement for scikit-

learn classifiers Here is a sample set of

commands

gtgtgt import autosklearnclassification

Implementing AutoML gtgtgt cls = autosklearnclassification

AutoSklearnClassifier()

gtgtgt clsfit(X_train y_train)

gtgtgt predictions = clspredict(X_test

y_test)

SMAC (Sequential Model-Based

Algorithm Configuration)

SMAC is a tool for automating certain

AutoML steps SMAC is useful for selection

of key features hyper-parameter

optimization and to speed up algorithmic

outputs

BOHB (Bayesian Optimization

Hyperband searches)

BOHB combines Bayesian hyper parameter

optimization with bandit methods for

faster convergence

Google H2O also have their respective

AutoML tools which are not covered here

but can be explored in specific cases

AutoML needs significant memory and

computational power to execute alternate

algorithms and compute results At

present GPU resources are extremely

costly to execute even simple Machine

Learning workloads such as CNN algorithm

to classify objects If multiple such

alternate algorithms should be executed

the computation dollar needed would be

exponential This is impractical infeasible

and inefficient for the current state of Data

Science industry Adoption of AutoML will

depend on two things one the maturity

of AutoML pipeline and second but more

important how quickly GPU clusters

become cheap The second being most

critical Selling Cloud GPU capacity could

be one of the motivation of several cloud

based infrastructure-running companies

to promote AutoML in the industry Also

AutoML will not replace the Data scientistrsquos

work but can provide augmentation

and speed to certain tasks such as data

standardization model tuning and

trying multiple algorithms It is only the

beginning for AutoML but this technique

has high relevance and usefulness for

solving ultra-complex problems

External Document copy 2019 Infosys Limited

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

Neural Architecture Search (NAS) is a

component of AutoML and addresses

the important step of designing Neural

Network Architecture

Designing fresh Neural Net architecture

involves an expert establishing and

organizing Neural Network layers filters

or channels filter sizes selecting other

optimum Hyper parameters and so on

through several rounds of computational

Neural Architecture Search (NAS)

iterations Since AlexNet deep neural

network architecture won the ImageNet

(image classification based on ImageNet

dataset) competition in 2012 several

architecture styles such as VGG ResNet

Inception Xception InceptionResNet

MobileNet and NASNet have significantly

evolved However selection of the right

architecture for the right problem is also

a skill due to the presence of various

influencers such as applicability to the

problem accuracy number of parameters

memory and computational footprint and

size of the architecture that govern the

overall functioning efficiency

Neural Architecture Search tries to address

this problem space by automatically

selecting right Neural Network architecture

to solve a given problem

AutoML

HyperparameterOptimization

NAS

External Document copy 2019 Infosys Limited

Figure 130 Source Liam Li Ameet Talwalkar What is neural architecture search

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

Search space The search space provides

boundary within which the specific

architecture needs to be searched

Computer Vision (captioning the scene

or product identification) based use

cases would need a different neural

network architecture style as against

Speech (speech transcription or speaker

classification) or unstructured Text (Topic

extraction intent mining) based use cases

Search space tries to provide available

catalogs of best in class architectures based

on other domain data and performance

Key Components of NAS

These are also usually hand crafted by

expert data scientists

Optimization method This is responsible

for providing mechanism to search the

best architecture It could be searched

and applied randomly or using certain

statistical or Machine Learning evaluation

approach such as Bayesian method or

reinforcement learning methods

Evaluation method This has the role

of evaluating the quality of architecture

considered by optimization method It

could be done using full training approach

or doing partial training and then applying

certain specialized methods such as partial

training or early stopping weights sharing

network morphism etc

For selective problem spaces as

compared to manual methods NAS have

outperformed and is showing definite

promise for future However it is still

evolving and not ready for production

usages as several architectures need to be

established and evaluated depending on

the problem space

Search Space

DAG Representation

Cell Block

Meta-Architecture

NAS Specic

Reinforcement Learning

Evolutionary Search

Gradient-Based Optimization

BayesianOptimization

Optimization Method

Components of NAS

Full Training

Partial Training

Weight-Sharing

Network Morphism

Hypernetworks

Evaluation Method

Figure 140 Components of NAS Source Liam Li Ameet Talwalkar What is neural architecture search

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

In this paper we looked at some key H3 AI

areas by no means this is an exhaustive

list Amongst all discussed Transfer

Learning Capsule Networks Explainable

AI Generative AI are making interesting

Addressing H3 AI Trends at Infosys

things possible and looks highly promising

We are keenly experimenting with these

building early use cases and integrating

into our product stack Infosys Enterprise

Cognitive platform (iECP) to solve

interesting client problems Here is a look

at how we are employing these H3 trends

in the work we do

Trend Use cases

1

2

3

4

5

6

7

8

9

10

Explainable AI (XAI)

Generative AI Neural Style Transfer (NST)

Fine Grained Classication

Capsule Networks

Meta Learning

Transfer Learning

Single Shot Learning

Deep Reinforcement Learning (RL)

Auto ML

Neural Architecture Search (NAS)

Applicable across where results need to be traced eg Tumor Detection Mortgage Rejection Candidate Selection etc

Art Generation Sketch Generation Image or Video Resolution Improvements Data GenerationAugmentation Music Generation

Vehicle Classication Type of Tumor Detection

Image Re-constructionImage ComparisonMatching

Intelligent Agents Continuous Learning scenarios for document review and corrections

Identifying person not wearing helmet Logobrand detection in the image Speech Model training for various accents vocabularies

Face Recognition Face Verication

Intelligent Agents Robots Driverless cars Trac Light Monitoring Continuous Learning scenarios for document review and corrections

Invoice Attribute Extraction Document Classication Document Clustering

CNN or RNN based use cases such as Image Classication Object Identication Image Segmentation Speaker Classication etc

External Document copy 2019 Infosys Limited

Table 20 AI Use cases Infosys Research

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

1 Explainable AI (XAI)

2 Fine Grained Classification

5 Transfer Learning

6 Single Shot Learning

3 Capsule Networks

4 Meta Learning

7 Deep Reinforcement Learning (RL)

8 Auto ML

bull httpschristophmgithubiointerpretable-ml-book

bull httpssimmachinescomexplainable-ai

bull httpswwwcmuedunewsstoriesarchives2018octoberexplainable-aihtml

bull httpsmediumcomQuantumBlackmaking-ai-human-again-the-importance-of-explainable-ai-xai-95d347ccbb1c

bull httpstowardsdatasciencecomexplainable-artificial-intelligence-part-2-model-interpretation-strategies-75d4afa6b739

bull httpsvisioncornelleduse3wp-contentuploads201502BMVC14pdf

bull httpswwwfastai20180723auto-ml-3

bull httpsarxivorgpdf160305106pdf

bull httpsarxivorgpdf171009829pdf

bull httpskerasioexamplescifar10_cnn_capsule

bull httpswwwyoutubecomwatchv=pPN8d0E3900

bull httpswwwyoutubecomwatchv=rTawFwUvnLE

bull httpsmediumfreecodecamporgunderstanding-capsule-networks-ais-alluring-new-architecture-bdb228173ddc

bull httpsmediumcomjrodthoughtswhats-new-in-deep-learning-research-openai-s-reptile-makes-it-easier-to-learn-how-to-learn-e0f6651a39f0

bull httpproceedingsmlrpressv48santoro16pdf

bull httpstowardsdatasciencecomwhats-new-in-deep-learning-research-understanding-meta-learning-91fef1295660

bull httpsdeepmindcomblogarticledeep-reinforcement-learning

bull httpsmediumfreecodecamporgan-introduction-to-reinforcement-learning-4339519de419

bull httpsmediumcomjonathan_huialphago-zero-a-game-changer-14ef6e45eba5

bull httpsarxivorgpdf181112560pdf

bull httpswwwml4aadorgautomated-algorithm-designalgorithm-configurationsmac

bull httpswwwfastai20180723auto-ml-3

bull httpswwwfastai20180716auto-ml2auto-ml

bull httpscompetitionscodalaborgcompetitions17767

bull httpswwwautomlorgautomlauto-sklearn

bull httpswwwml4aadorgautomated-algorithm-designalgorithm-configurationsmac

bull httpsautomlgithubioHpBandSterbuildhtmloptimizersbohbhtml

Reference

copy 2019 Infosys Limited Bengaluru India All Rights Reserved Infosys believes the information in this document is accurate as of its publication date such information is subject to change without notice Infosys acknowledges the proprietary rights of other companies to the trademarks product names and such other intellectual property rights mentioned in this document Except as expressly permitted neither this documentation nor any part of it may be reproduced stored in a retrieval system or transmitted in any form or by any means electronic mechanical printing photocopying recording or otherwise without the prior permission of Infosys Limited and or any named intellectual property rights holders under this document

For more information contact askusinfosyscom

Infosyscom | NYSE INFY Stay Connected

9 Neural Architecture Search (NAS)

10 Infosys Enterprise Cognitive Platform

bull httpswwworeillycomideaswhat-is-neural-architecture-search

bull httpswwwinfosyscomservicesincubating-emerging-technologiesofferingsPagesenterprise-cognitive-platformaspx

Sudhanshu Hate is inventor and architect of Infosys Enterprise Cognitive Platform (iECP)

a microservices API based Artificial Intelligence platform He has over 21 years of experience

in creating products solutions and working with clients on industry problems His current

areas of interests are Computer Vision Speech and Unstructured Text based AI possibilities

To know more about our work on the H3 trends in AI write to icetsinfosyscom

About the author

Page 6: H3 Trends in AI Algorithms: The Infosys Way...• Sentiment Analysis Figure 1.0: ... from use cases such as language translations, sentence formulation, text summarization, topic extraction

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

Some of the other GAN variations that are

popular are

Super Resolution GAN (SRGAN) that

helps improve quality of images

Stack-GAN that generates realistic

looking photographs from textual

descriptions of simple objects like birds

and flowers

Sketch-GAN a Generative model for

vector drawings which is a Recurrent

Neural Network (RNN) and is able to

construct stroke-based drawings of

common objects The model is trained

on a dataset of human-drawn images

representing many different classes

eGANs (Evolutionary Generative

Adversarial Networks) that generate

photographs of faces with different

ages from young to old

IcGAN to reconstruct photographs

of faces with specific features such

as changes in hair color style facial

expression and even gender

Content image

Content image

Content image

Colorful circle

Louvre museum

Ancient city of Persepolis

Blue painting

Impressionist style painting

Colorful circle with blue painting style

Louvre painting with impressionist style

Persepolis in Van Gogh style

The Starry Night (Van Gogh)

Style image

Style image

Style image

Generated image

Generated image

Generated image

External Document copy 2019 Infosys Limited

Figure 40 Novel Artistic Images through Neural Style Transfer Source Fisseha Berhane Deep Learning amp Art Neural Style Transfer

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

Classification of an object into specific

categories such as car table flower and

such are common in Computer Vision

However establishing the objectsrsquo finer

class based on specific characteristics is

where AI is making rapid progress This is

because granular features of objects are

being trained and used for differentiation

of objects

Examples of Fine Grained Classification are

In Fine Grained Classification the

progression through the 8-layer CNN

network can be thought of as a progression

from low to mid to high-level features

The later layers aggregate more complex

structural information across larger

scalesndashsequences of convolutional layers

Fine Grained Classification Fine grained clothing style finder type

of a shoe etc

Recognizing a car type

Recognizing breed of a dog plant

species insect bird species etc

However fine-grained classification is

challenging due to the difficulty of finding

discriminative features Finding those

subtle traits that fully characterize the

object is not straightforward

Feature representations that better

preserve fine-grained information

Segmentation-based approaches that

facilitate extraction of purer features

and partpose normalized feature

spaces

Pose Normalization Schemes

Fine Grained Classification Approaches

interleaved with max-pooling can capture

deformable parts and fully connected

layers can capture complex co-occurrence

statistics

Bird recognition is one of the major examples in fine grained classification in the below image given a test image

groups of detected key points are used to compute multiple warped image regions that are aligned with prototypical models Each region is fed through a deep convolutional network and features are extracted from multiple layers after which they are concatenated and fed to a classifier

Figure 50 Bird Recognition Pipeline Overview Source Branson Van Hoen et al Bird Species Categorization

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

The below pictures and steps depict fine

grained classification approach for car

detection system

a Detects parts using a collection of

unsupervised part detectors

b Outputs a grid of discriminative

features (The CNN is learned with class

Car Detection System using Fine Grained Classification

labels and then truncated retaining

the first two convolutional layers

that retain spatial information) The

appearance of each part detected

using the learned CNN features is

described by pooling in the detected

region of each part

c Appearance of any undetected part

is set to zero This results in Ensemble

of Localized Learned Features (ELLF)

representation which is then used to

predict fine-grained object categories

d A standard CNN passes the output

of the convolutional layers through

several fully connected layers in order

to make a prediction

Convolutional Network are so far the

defacto and well accepted algorithms to

work with image based datasets They

work on the pixels of images using various

size filters (channels) by convolving using

pooling techniques to bubble the stronger

features to derive colors textures edges

and shapes and establish structures

through lower to highest layers

Given the face of a person CNN identifies

the face by establishing eyes ears

eyebrows lips chin etc components

of the face However if the facial image

is provided with incorrect position and

Capsule Network

alignment of eyes and eyebrows or say

eyebrows swaps with lips and ears are

placed on forehead the same CNN trained

algorithm would still go on and detect this

as a human face This is the huge drawback

of CNN algorithm and happens due to

its inability to store the information on

relative position of various objects

Capsule Network invented by Geoffery

Hinton addresses exactly this problem of

CNN by storing the spatial relationships of

various parts

Capsule Network like CNN are multi

layered neural networks consisting of

several capsules each capsule consists

of several neurons Capsules in lower

layers are called primary capsules and are

trained to detect an object (eg triangle

circle) within a given region of image It

outputs a vector that has two properties

Length and Orientation Length represents

the probability of the presence of the

object and Orientation represents the

pose parameters of the object such as

coordinates rotation angle etc

Capsules in higher layers called routing

capsules detect larger and more complex

objects such as eyes ears etc

Figure 60 Car Detection System Source Learning Features and Parts for Fine-Grained Recognition

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

Routing by Agreement

Advantage over CNN

Unlike CNN which primarily bubbles higher

order features using max or avg pooling

Capsule Network bubbles up features

using routing by agreement where every

capsule participates in choosing the shape

by voting (democratic election way)

In the figure given above

Lower level corresponds to rectangles

triangles and circles

High level corresponds to houses

boats and cars

If there is an image of a house the

capsules corresponding to rectangles

and triangles will have large activation

vectors Their relative positions (coded

in their instantiation parameters) will bet

on the presence of high-level objects

Since they will agree on the presence of

house the output vector of the house

capsule will become large This in turn

will make the predictions by the rectangle

and the triangle capsules larger This

cycle will repeat 4-5 times after which the

bets on the presence of a house will be

considerably larger than the bets on the

presence of a boat or a car

Less data for training - Capsule

Networks need very less data for

training (almost 10) as compared to

CNN

Fewer parameters The connections

between layers require fewer

parameters as capsule groups neurons

resulting in relatively less computations

bandwidth

Preserve pose and position - They

preserve pose and position information

as against CNN

High accuracy - Capsule Networks

have higher accuracy as compared to

CNNs

Reconstruction vs mere classification

- CNN helps you to classify the images

but not reconstruct the same image

whereas Capsule Networks help you to

reconstruct the exact image

Information retention vs loss - With

CNN during edge detection kernel

for edge detection works only on a

specific angle and each angle requires

a corresponding kernel When dealing

with edges CNN works well because

there are very few ways to describe

an edge Once we get up to the level

of shapes we do not want to have a

kernel for every angle of rectangles

ovals triangles and so on It would

get unwieldy and would become

even worse when dealing with more

complicated shapes that have 3

dimensional rotations and features like

lighting the reason why traditional

neural nets do not handle unseen

rotations effectively

Capsule Networks are best suited for

object detection and image segmentation

while it helps better model hierarchical

relationships and provides high accuracy

However Capsule Networks are still under

research and relatively new and mostly

tested and benchmarked on MNIST

dataset but they will be the future in

working with massive use cases emerging

from Vision datasets

Figure 70 A simple CapsNet with 3 layers This model gives comparable results to deep convolutional networks Source Dynamic

Routing Between Capsules Sara Sabour Nicholas Frosst Geoffrey E Hinton

Figure 80 Capsule Network for House or Boat classification Source Beginnersrsquo Guide to Capsule Networks

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

Traditional methods of learning in Machine

Learning focuses on taking a huge labeled

dataset and then learning to detect y

(dependent variable say classifying an

image as cat or dog) and given set of x

(independent variables images of cats

and dogs) This process involves selection

of an algorithm such as Convolution

Neural Net and arriving at various hyper

parameters such as number of layers

in the network number of neurons in

each layer learning rate weights bias

dropouts activation function to activate

the neuron such as sigmoid tanh and

Relu The learning happens through

several iterations of forward and backward

passes (propagation) by readjusting (also

called learning) the weights based on

difference in the loss (actual vs computed)

At the minimal loss the weights and

other network parameters are frozen

and are considered final model for future

prediction tasks This is obviously a long

and tedious process and repeating this for

every use case or task is engineering data

and compute intensive

Meta Learning focuses on how to learn to

learn It is one of the fascinating discipline

of artificial intelligence Human beings

have varying styles of learning Some

Humans can learn from their own existing

experiences or experiences they have

heard seen or observed Transfer Learning

discipline of AI is based on similar traits of

human learning where new models can

learn and benefit from existing trained

model

For example if a Computer Vision based

detection model with no Transfer

Learning that already detects various

types of vehicles such as cars trucks and

bicycles needs to be trained to detect an

airplane then you may have to retrain the

full model with images of all the previous

objects

Like the variety in human learning

techniques Meta Learning also uses

various learning methods based on

patterns of problems such as those based

on boundary space amount of data by

optimizing size of neural network or using

recurrent network approach Each of these

are briefly discussed inline

Few Shots Meta-Learning

This learning technique focuses on

learning from a few instances of data

Typically Neural Nets need millions of

data points to learn however Few Shots

Meta- Learning uses only a few instances

of data to build models Examples being

Facial recognition systems using Single

Shot Learning this is explained in detail in

Single Shot Learning section

Optimizer Meta-Learning

In this method the emphasis is on

optimizing the neural network and its

hyper- parameters A great example of

optimizer meta-learning are models that

are focused on improving gradient descent

techniques

Metric Meta-Learning

In this learning method the metric space

is narrowed down to improve the focus of

learning Then the learning is carried out

only in this metric space by leveraging

various optimization parameters that are

established for the given metric space

Recurrent Model Meta-Learning

This type of meta-learning model is tailored

to Recurrent Neural Networks(RNNs)

such as Long-Short-Term-Memory(LSTM)

In this architecture the meta-learner

algorithm will train a RNN model to process

a dataset sequentially and then process

new inputs from the task In an image

classification setting this might involve

passing in the set of (image label) pairs

of a dataset sequentially followed by new

examples which must be classified Meta-

Reinforcement Learning is an example of

this approach

Meta Learning

Transfer Learning (TL)

Types of Meta-Learning Models

people learn and memorize with one

instance of visual or auditory scan Some

people need multiple perspectives to

strengthen the neural connections for

permanent memory Some remember by

writing while some remember through

actual experiences Meta Learning tries

to leverage these to build its learning

characteristics

However with Transfer Learning you can

introduce an additional layer on top of the

existing pre-trained layer to start detecting

airplanes

Typically in a no Transfer Learning

scenario model needs to be trained and

during training right weights are arrived

at by doing many iterations (epochs) of

forward and back propagation which takes

significant amount of computation power

and time In addition Vision models need

significant amount of image data such as

in this example images of airplanes to be

trained

With Transfer Learning approach you can

reuse the existing pre-trained weights of an

existing trained model with significantly

less number of images ( 5 to 10 percent

of actual images needed for training

ground up model) for the model to start

detecting As the pre-trained model has

already learnt some basic learning around

identifying edges curves and shapes in the

earlier layers it needs to learn only higher

order features specific to airplanes with

the existing computed weights In brief

Transfer Learning helps eliminate the need

to learn anything from scratch

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

Transfer Learning helps in saving

significant amount of data computational

power and time in training new models as

they leverage pre-trained weights from the

existing trained models and architectures

However it is important to understand

that Transfer Learning approach today

is only matured enough to be applied to

similar use cases that is you cannot use

the above discussed model to train a facial

recognition model

Another key thing during Transfer Learning

is that it is important to understand the

details of the data on which new use cases

are being trained as it can implicitly push

the built-in biases from the underlying data

into newer systems It is recommended

that the datasheets of underlying models

and data be studied thoroughly unless the

usage is for experimentative purpose

Earlier having used the human brain

rationale it is important to note that

human brains have gone through centuries

of experiences and gene evolution and has

the ability to learn faster whereas transfer

learning is just a few decades old and is

becoming ground for new vision and text

use cases

External Document copy 2019 Infosys Limited

Figure 90 Transfer Learning Layers Source John Cherrie Training Deep Learning Models with Transfer Learning

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

Humans have the impressive skill to reason

about new concepts and experiences with

just a single example They have the ability

for one-shot generalization the aptitude

to encounter a new concept understand

its structure and then generate compelling

alternative variations of the same

Facial recognition systems are good

candidates for Single Shot Learning

otherwise needing ten thousands of

individual face images to train one neural

network can be extremely costly time

consuming and infeasible However a

Single Shot LearningSingle Shot Learning based system using

existing pre-trained FaceNet model and

facial encoding based approach on top of

it can be very effective to establish face

similarity by computing distance between

the faces

In this approach 128 bit encoding of each

face image is generated and compared

with other imagersquos encoding to determine

if the person is same or different

Various distance based algorithms such

as Euclidean distance can be used to

determine if they are within specified

threshold The model training approach

involves creating pairs of (Anchor Positive)

and (Anchor Negative) and training the

model in a way where (Anchor Positive)

pair distance difference is smaller and

(Anchor Negative) distance is farther

ldquoAnchorrdquo is the image of a person for whom

the recognition model needs to be trained

ldquoPositiverdquo is another image of the same

person

ldquoNegativerdquo is image of a different person

External Document copy 2019 Infosys Limited

Figure 100 Encoding approach inspired from ML Course from Coursera

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

This is a specialized Machine Learning

discipline where an agent learns to behave

in an environment by getting reward or

punishment for the actions performed The

agent can have an objective to maximize

short term or long-term rewards This

discipline uses deep learning techniques to

Value Based

In value-based RL the goal is to optimize

the value function V(s) Qtable uses any

The agent will use this value function to

select which state to choose at each step

Policy Based

In policy-based RL we want to directly

optimize the policy function π(s) without

using a value function

The policy is what defines the agent

behavior at a given time

Deep Reinforcement Learning (RL)

Three Approaches to Reinforcement Learning

bring in human level performance on the

given task

Deep Reinforcement Learning has found

significant relevance and application in

various game design systems such as

creating video games chess alpha Go

Atari as well as in industrial applications of

mathematical function to arrive at a state

based on action

The value of each state is the total amount

There are two types of policies

1 Deterministic A policy which at a given

state will always return the same action

2 Stochastic A policy that outputs a

distribution probability over actions

Value based and Policy based are more

conventional Reinforcement Learning

approaches They are useful for modeling

relatively simple systems

robots driverless car etc

In reinforcement learning policy p

controls what action we should take Value

function v measures how good it is to be

in a particular state The value function

tells us the maximum expected future

reward the agent will get at each state

of the reward an agent can expect to

accumulate over the future starting at that

state

State

State

Q value

Q value action 2

Q value action 1

Q value action 3

Qtable

Deep Q Neuralnetwork

Q learning

Deep Q learning

Action

ExpectedReward discounted

Given that state

action = policy(state)

Figure 110 Schema inspired by the Q learning notebook by Udacity

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

Model Based

In model-based RL we model the

environment This means we create a

model of the behavior of the environment

then this model is used to arrive at results

that maximises short term or long-term

rewards The model equation can be

any equation that is defined based on

the environments behavior and must be

sufficiently generalized to counter new

situations

When Model based approach uses Deep

Neural Network algorithms to sufficiently

well generalize and learn the complexities

of the environment to produce optimal

results it is called Deep Reinforcement

Learning The challenge with model based

approach is each environment needs a

dedicated trained model

AlphaGo was trained by using data from

several games to beat the human being

in the game of Go The training accuracy

was just 57 and still it was sufficient to

beat the human level performance The

training methods involved reinforcement

learning and deep learning to build a

policy network that tells what moves are

promising and a value network that tells

how good the board position is Searches

for the final move from these networks

is done using Monte Carlo Tree Search

(MCTS) algorithm Using supervised

learning a policy network was created to

imitate the expert moves

Deep Mind released AlphaGo Zero in late

2017 which beat AlphaGo and did not

involve any training from previous games

data to train deep network The deep

network training was done by picking

the training samples from AlphaGo and

AlphaGo Zero playing games against

itself and selecting best moves to train

the network and then applying those

in real games to improve the results

iteratively This is possible because deep

reinforcement learning algorithms can

store long-range tree search results for the

next best move in memory and do very

large computations that are difficult for a

human brain

Designing machine learning solution

involves several steps such as collecting

data understanding cleansing and

normalizing data doing feature

engineering selecting or designing

the algorithm selecting the model

architecture selecting and tuning modelrsquos

hyper-parameters evaluating modelrsquos

performance deploying and monitoring

the machine learning system in an online

system and so on Such machine learning

solution design requires an expert Data

Scientist to complete the pipeline

Auto ML (AML)As the complexity of these and other tasks

can easily get overwhelming the rapid

growth of machine learning applications

has created a demand for off-the-shelf

machine learning methods that can be

used easily and without expert knowledge

The AI research area that encompasses

progressive automation of machine

learning pipeline tasks is called AutoML

(Automatic Machine Learning)

Google CEO Sundar Pichai wrote

ldquoDesigning neural nets is extremely time

intensive and requires an expertise that

limits its use to a smaller community of

scientists and engineers Thatrsquos why wersquove

created an approach called AutoML

showing that itrsquos possible for neural nets

to design neural netsrdquo while Googlersquos

Head of AI Jeff Dean suggested that 100x

computational power could replace the

need for machine learning expertise

AutoML Vision relies on two core

techniques transfer learning and neural

architecture search

Xtrain Ytrain

Xtest budget

Han

d-cr

afte

d po

rtfo

lio Meta Learning

AutoML system

Build Ensemble Ytest Data

ProcessorFeature

Preprocessor

Bayesian Optimization

Classier

ML Pipeline

Figure 120 An example of Auto sklearn pipeline Source Andreacute Biedenkapp We did it Again World Champions in AutoML

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

Here is a look at the few libraries that help

in implementing AutoML

AUTO-SKLEARN

AUTO SKLEARN automates several key

tasks in Machine Learning pipeline such

as addressing column missing values

encoding of categorical values data scaling

and normalization feature pre-processing

and selection of right algorithm with

hyper-parameters The pipeline supports

15 Classification and 14 Feature processing

algorithms Selection of right algorithm can

happen based on ensembling techniques

and applying meta knowledge gathered

from executing similar scenarios (datasets

and algorithms)

Usage

Auto-sklearn is written in python and can

be considered as replacement for scikit-

learn classifiers Here is a sample set of

commands

gtgtgt import autosklearnclassification

Implementing AutoML gtgtgt cls = autosklearnclassification

AutoSklearnClassifier()

gtgtgt clsfit(X_train y_train)

gtgtgt predictions = clspredict(X_test

y_test)

SMAC (Sequential Model-Based

Algorithm Configuration)

SMAC is a tool for automating certain

AutoML steps SMAC is useful for selection

of key features hyper-parameter

optimization and to speed up algorithmic

outputs

BOHB (Bayesian Optimization

Hyperband searches)

BOHB combines Bayesian hyper parameter

optimization with bandit methods for

faster convergence

Google H2O also have their respective

AutoML tools which are not covered here

but can be explored in specific cases

AutoML needs significant memory and

computational power to execute alternate

algorithms and compute results At

present GPU resources are extremely

costly to execute even simple Machine

Learning workloads such as CNN algorithm

to classify objects If multiple such

alternate algorithms should be executed

the computation dollar needed would be

exponential This is impractical infeasible

and inefficient for the current state of Data

Science industry Adoption of AutoML will

depend on two things one the maturity

of AutoML pipeline and second but more

important how quickly GPU clusters

become cheap The second being most

critical Selling Cloud GPU capacity could

be one of the motivation of several cloud

based infrastructure-running companies

to promote AutoML in the industry Also

AutoML will not replace the Data scientistrsquos

work but can provide augmentation

and speed to certain tasks such as data

standardization model tuning and

trying multiple algorithms It is only the

beginning for AutoML but this technique

has high relevance and usefulness for

solving ultra-complex problems

External Document copy 2019 Infosys Limited

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

Neural Architecture Search (NAS) is a

component of AutoML and addresses

the important step of designing Neural

Network Architecture

Designing fresh Neural Net architecture

involves an expert establishing and

organizing Neural Network layers filters

or channels filter sizes selecting other

optimum Hyper parameters and so on

through several rounds of computational

Neural Architecture Search (NAS)

iterations Since AlexNet deep neural

network architecture won the ImageNet

(image classification based on ImageNet

dataset) competition in 2012 several

architecture styles such as VGG ResNet

Inception Xception InceptionResNet

MobileNet and NASNet have significantly

evolved However selection of the right

architecture for the right problem is also

a skill due to the presence of various

influencers such as applicability to the

problem accuracy number of parameters

memory and computational footprint and

size of the architecture that govern the

overall functioning efficiency

Neural Architecture Search tries to address

this problem space by automatically

selecting right Neural Network architecture

to solve a given problem

AutoML

HyperparameterOptimization

NAS

External Document copy 2019 Infosys Limited

Figure 130 Source Liam Li Ameet Talwalkar What is neural architecture search

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

Search space The search space provides

boundary within which the specific

architecture needs to be searched

Computer Vision (captioning the scene

or product identification) based use

cases would need a different neural

network architecture style as against

Speech (speech transcription or speaker

classification) or unstructured Text (Topic

extraction intent mining) based use cases

Search space tries to provide available

catalogs of best in class architectures based

on other domain data and performance

Key Components of NAS

These are also usually hand crafted by

expert data scientists

Optimization method This is responsible

for providing mechanism to search the

best architecture It could be searched

and applied randomly or using certain

statistical or Machine Learning evaluation

approach such as Bayesian method or

reinforcement learning methods

Evaluation method This has the role

of evaluating the quality of architecture

considered by optimization method It

could be done using full training approach

or doing partial training and then applying

certain specialized methods such as partial

training or early stopping weights sharing

network morphism etc

For selective problem spaces as

compared to manual methods NAS have

outperformed and is showing definite

promise for future However it is still

evolving and not ready for production

usages as several architectures need to be

established and evaluated depending on

the problem space

Search Space

DAG Representation

Cell Block

Meta-Architecture

NAS Specic

Reinforcement Learning

Evolutionary Search

Gradient-Based Optimization

BayesianOptimization

Optimization Method

Components of NAS

Full Training

Partial Training

Weight-Sharing

Network Morphism

Hypernetworks

Evaluation Method

Figure 140 Components of NAS Source Liam Li Ameet Talwalkar What is neural architecture search

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

In this paper we looked at some key H3 AI

areas by no means this is an exhaustive

list Amongst all discussed Transfer

Learning Capsule Networks Explainable

AI Generative AI are making interesting

Addressing H3 AI Trends at Infosys

things possible and looks highly promising

We are keenly experimenting with these

building early use cases and integrating

into our product stack Infosys Enterprise

Cognitive platform (iECP) to solve

interesting client problems Here is a look

at how we are employing these H3 trends

in the work we do

Trend Use cases

1

2

3

4

5

6

7

8

9

10

Explainable AI (XAI)

Generative AI Neural Style Transfer (NST)

Fine Grained Classication

Capsule Networks

Meta Learning

Transfer Learning

Single Shot Learning

Deep Reinforcement Learning (RL)

Auto ML

Neural Architecture Search (NAS)

Applicable across where results need to be traced eg Tumor Detection Mortgage Rejection Candidate Selection etc

Art Generation Sketch Generation Image or Video Resolution Improvements Data GenerationAugmentation Music Generation

Vehicle Classication Type of Tumor Detection

Image Re-constructionImage ComparisonMatching

Intelligent Agents Continuous Learning scenarios for document review and corrections

Identifying person not wearing helmet Logobrand detection in the image Speech Model training for various accents vocabularies

Face Recognition Face Verication

Intelligent Agents Robots Driverless cars Trac Light Monitoring Continuous Learning scenarios for document review and corrections

Invoice Attribute Extraction Document Classication Document Clustering

CNN or RNN based use cases such as Image Classication Object Identication Image Segmentation Speaker Classication etc

External Document copy 2019 Infosys Limited

Table 20 AI Use cases Infosys Research

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

1 Explainable AI (XAI)

2 Fine Grained Classification

5 Transfer Learning

6 Single Shot Learning

3 Capsule Networks

4 Meta Learning

7 Deep Reinforcement Learning (RL)

8 Auto ML

bull httpschristophmgithubiointerpretable-ml-book

bull httpssimmachinescomexplainable-ai

bull httpswwwcmuedunewsstoriesarchives2018octoberexplainable-aihtml

bull httpsmediumcomQuantumBlackmaking-ai-human-again-the-importance-of-explainable-ai-xai-95d347ccbb1c

bull httpstowardsdatasciencecomexplainable-artificial-intelligence-part-2-model-interpretation-strategies-75d4afa6b739

bull httpsvisioncornelleduse3wp-contentuploads201502BMVC14pdf

bull httpswwwfastai20180723auto-ml-3

bull httpsarxivorgpdf160305106pdf

bull httpsarxivorgpdf171009829pdf

bull httpskerasioexamplescifar10_cnn_capsule

bull httpswwwyoutubecomwatchv=pPN8d0E3900

bull httpswwwyoutubecomwatchv=rTawFwUvnLE

bull httpsmediumfreecodecamporgunderstanding-capsule-networks-ais-alluring-new-architecture-bdb228173ddc

bull httpsmediumcomjrodthoughtswhats-new-in-deep-learning-research-openai-s-reptile-makes-it-easier-to-learn-how-to-learn-e0f6651a39f0

bull httpproceedingsmlrpressv48santoro16pdf

bull httpstowardsdatasciencecomwhats-new-in-deep-learning-research-understanding-meta-learning-91fef1295660

bull httpsdeepmindcomblogarticledeep-reinforcement-learning

bull httpsmediumfreecodecamporgan-introduction-to-reinforcement-learning-4339519de419

bull httpsmediumcomjonathan_huialphago-zero-a-game-changer-14ef6e45eba5

bull httpsarxivorgpdf181112560pdf

bull httpswwwml4aadorgautomated-algorithm-designalgorithm-configurationsmac

bull httpswwwfastai20180723auto-ml-3

bull httpswwwfastai20180716auto-ml2auto-ml

bull httpscompetitionscodalaborgcompetitions17767

bull httpswwwautomlorgautomlauto-sklearn

bull httpswwwml4aadorgautomated-algorithm-designalgorithm-configurationsmac

bull httpsautomlgithubioHpBandSterbuildhtmloptimizersbohbhtml

Reference

copy 2019 Infosys Limited Bengaluru India All Rights Reserved Infosys believes the information in this document is accurate as of its publication date such information is subject to change without notice Infosys acknowledges the proprietary rights of other companies to the trademarks product names and such other intellectual property rights mentioned in this document Except as expressly permitted neither this documentation nor any part of it may be reproduced stored in a retrieval system or transmitted in any form or by any means electronic mechanical printing photocopying recording or otherwise without the prior permission of Infosys Limited and or any named intellectual property rights holders under this document

For more information contact askusinfosyscom

Infosyscom | NYSE INFY Stay Connected

9 Neural Architecture Search (NAS)

10 Infosys Enterprise Cognitive Platform

bull httpswwworeillycomideaswhat-is-neural-architecture-search

bull httpswwwinfosyscomservicesincubating-emerging-technologiesofferingsPagesenterprise-cognitive-platformaspx

Sudhanshu Hate is inventor and architect of Infosys Enterprise Cognitive Platform (iECP)

a microservices API based Artificial Intelligence platform He has over 21 years of experience

in creating products solutions and working with clients on industry problems His current

areas of interests are Computer Vision Speech and Unstructured Text based AI possibilities

To know more about our work on the H3 trends in AI write to icetsinfosyscom

About the author

Page 7: H3 Trends in AI Algorithms: The Infosys Way...• Sentiment Analysis Figure 1.0: ... from use cases such as language translations, sentence formulation, text summarization, topic extraction

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

Classification of an object into specific

categories such as car table flower and

such are common in Computer Vision

However establishing the objectsrsquo finer

class based on specific characteristics is

where AI is making rapid progress This is

because granular features of objects are

being trained and used for differentiation

of objects

Examples of Fine Grained Classification are

In Fine Grained Classification the

progression through the 8-layer CNN

network can be thought of as a progression

from low to mid to high-level features

The later layers aggregate more complex

structural information across larger

scalesndashsequences of convolutional layers

Fine Grained Classification Fine grained clothing style finder type

of a shoe etc

Recognizing a car type

Recognizing breed of a dog plant

species insect bird species etc

However fine-grained classification is

challenging due to the difficulty of finding

discriminative features Finding those

subtle traits that fully characterize the

object is not straightforward

Feature representations that better

preserve fine-grained information

Segmentation-based approaches that

facilitate extraction of purer features

and partpose normalized feature

spaces

Pose Normalization Schemes

Fine Grained Classification Approaches

interleaved with max-pooling can capture

deformable parts and fully connected

layers can capture complex co-occurrence

statistics

Bird recognition is one of the major examples in fine grained classification in the below image given a test image

groups of detected key points are used to compute multiple warped image regions that are aligned with prototypical models Each region is fed through a deep convolutional network and features are extracted from multiple layers after which they are concatenated and fed to a classifier

Figure 50 Bird Recognition Pipeline Overview Source Branson Van Hoen et al Bird Species Categorization

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

The below pictures and steps depict fine

grained classification approach for car

detection system

a Detects parts using a collection of

unsupervised part detectors

b Outputs a grid of discriminative

features (The CNN is learned with class

Car Detection System using Fine Grained Classification

labels and then truncated retaining

the first two convolutional layers

that retain spatial information) The

appearance of each part detected

using the learned CNN features is

described by pooling in the detected

region of each part

c Appearance of any undetected part

is set to zero This results in Ensemble

of Localized Learned Features (ELLF)

representation which is then used to

predict fine-grained object categories

d A standard CNN passes the output

of the convolutional layers through

several fully connected layers in order

to make a prediction

Convolutional Network are so far the

defacto and well accepted algorithms to

work with image based datasets They

work on the pixels of images using various

size filters (channels) by convolving using

pooling techniques to bubble the stronger

features to derive colors textures edges

and shapes and establish structures

through lower to highest layers

Given the face of a person CNN identifies

the face by establishing eyes ears

eyebrows lips chin etc components

of the face However if the facial image

is provided with incorrect position and

Capsule Network

alignment of eyes and eyebrows or say

eyebrows swaps with lips and ears are

placed on forehead the same CNN trained

algorithm would still go on and detect this

as a human face This is the huge drawback

of CNN algorithm and happens due to

its inability to store the information on

relative position of various objects

Capsule Network invented by Geoffery

Hinton addresses exactly this problem of

CNN by storing the spatial relationships of

various parts

Capsule Network like CNN are multi

layered neural networks consisting of

several capsules each capsule consists

of several neurons Capsules in lower

layers are called primary capsules and are

trained to detect an object (eg triangle

circle) within a given region of image It

outputs a vector that has two properties

Length and Orientation Length represents

the probability of the presence of the

object and Orientation represents the

pose parameters of the object such as

coordinates rotation angle etc

Capsules in higher layers called routing

capsules detect larger and more complex

objects such as eyes ears etc

Figure 60 Car Detection System Source Learning Features and Parts for Fine-Grained Recognition

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

Routing by Agreement

Advantage over CNN

Unlike CNN which primarily bubbles higher

order features using max or avg pooling

Capsule Network bubbles up features

using routing by agreement where every

capsule participates in choosing the shape

by voting (democratic election way)

In the figure given above

Lower level corresponds to rectangles

triangles and circles

High level corresponds to houses

boats and cars

If there is an image of a house the

capsules corresponding to rectangles

and triangles will have large activation

vectors Their relative positions (coded

in their instantiation parameters) will bet

on the presence of high-level objects

Since they will agree on the presence of

house the output vector of the house

capsule will become large This in turn

will make the predictions by the rectangle

and the triangle capsules larger This

cycle will repeat 4-5 times after which the

bets on the presence of a house will be

considerably larger than the bets on the

presence of a boat or a car

Less data for training - Capsule

Networks need very less data for

training (almost 10) as compared to

CNN

Fewer parameters The connections

between layers require fewer

parameters as capsule groups neurons

resulting in relatively less computations

bandwidth

Preserve pose and position - They

preserve pose and position information

as against CNN

High accuracy - Capsule Networks

have higher accuracy as compared to

CNNs

Reconstruction vs mere classification

- CNN helps you to classify the images

but not reconstruct the same image

whereas Capsule Networks help you to

reconstruct the exact image

Information retention vs loss - With

CNN during edge detection kernel

for edge detection works only on a

specific angle and each angle requires

a corresponding kernel When dealing

with edges CNN works well because

there are very few ways to describe

an edge Once we get up to the level

of shapes we do not want to have a

kernel for every angle of rectangles

ovals triangles and so on It would

get unwieldy and would become

even worse when dealing with more

complicated shapes that have 3

dimensional rotations and features like

lighting the reason why traditional

neural nets do not handle unseen

rotations effectively

Capsule Networks are best suited for

object detection and image segmentation

while it helps better model hierarchical

relationships and provides high accuracy

However Capsule Networks are still under

research and relatively new and mostly

tested and benchmarked on MNIST

dataset but they will be the future in

working with massive use cases emerging

from Vision datasets

Figure 70 A simple CapsNet with 3 layers This model gives comparable results to deep convolutional networks Source Dynamic

Routing Between Capsules Sara Sabour Nicholas Frosst Geoffrey E Hinton

Figure 80 Capsule Network for House or Boat classification Source Beginnersrsquo Guide to Capsule Networks

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

Traditional methods of learning in Machine

Learning focuses on taking a huge labeled

dataset and then learning to detect y

(dependent variable say classifying an

image as cat or dog) and given set of x

(independent variables images of cats

and dogs) This process involves selection

of an algorithm such as Convolution

Neural Net and arriving at various hyper

parameters such as number of layers

in the network number of neurons in

each layer learning rate weights bias

dropouts activation function to activate

the neuron such as sigmoid tanh and

Relu The learning happens through

several iterations of forward and backward

passes (propagation) by readjusting (also

called learning) the weights based on

difference in the loss (actual vs computed)

At the minimal loss the weights and

other network parameters are frozen

and are considered final model for future

prediction tasks This is obviously a long

and tedious process and repeating this for

every use case or task is engineering data

and compute intensive

Meta Learning focuses on how to learn to

learn It is one of the fascinating discipline

of artificial intelligence Human beings

have varying styles of learning Some

Humans can learn from their own existing

experiences or experiences they have

heard seen or observed Transfer Learning

discipline of AI is based on similar traits of

human learning where new models can

learn and benefit from existing trained

model

For example if a Computer Vision based

detection model with no Transfer

Learning that already detects various

types of vehicles such as cars trucks and

bicycles needs to be trained to detect an

airplane then you may have to retrain the

full model with images of all the previous

objects

Like the variety in human learning

techniques Meta Learning also uses

various learning methods based on

patterns of problems such as those based

on boundary space amount of data by

optimizing size of neural network or using

recurrent network approach Each of these

are briefly discussed inline

Few Shots Meta-Learning

This learning technique focuses on

learning from a few instances of data

Typically Neural Nets need millions of

data points to learn however Few Shots

Meta- Learning uses only a few instances

of data to build models Examples being

Facial recognition systems using Single

Shot Learning this is explained in detail in

Single Shot Learning section

Optimizer Meta-Learning

In this method the emphasis is on

optimizing the neural network and its

hyper- parameters A great example of

optimizer meta-learning are models that

are focused on improving gradient descent

techniques

Metric Meta-Learning

In this learning method the metric space

is narrowed down to improve the focus of

learning Then the learning is carried out

only in this metric space by leveraging

various optimization parameters that are

established for the given metric space

Recurrent Model Meta-Learning

This type of meta-learning model is tailored

to Recurrent Neural Networks(RNNs)

such as Long-Short-Term-Memory(LSTM)

In this architecture the meta-learner

algorithm will train a RNN model to process

a dataset sequentially and then process

new inputs from the task In an image

classification setting this might involve

passing in the set of (image label) pairs

of a dataset sequentially followed by new

examples which must be classified Meta-

Reinforcement Learning is an example of

this approach

Meta Learning

Transfer Learning (TL)

Types of Meta-Learning Models

people learn and memorize with one

instance of visual or auditory scan Some

people need multiple perspectives to

strengthen the neural connections for

permanent memory Some remember by

writing while some remember through

actual experiences Meta Learning tries

to leverage these to build its learning

characteristics

However with Transfer Learning you can

introduce an additional layer on top of the

existing pre-trained layer to start detecting

airplanes

Typically in a no Transfer Learning

scenario model needs to be trained and

during training right weights are arrived

at by doing many iterations (epochs) of

forward and back propagation which takes

significant amount of computation power

and time In addition Vision models need

significant amount of image data such as

in this example images of airplanes to be

trained

With Transfer Learning approach you can

reuse the existing pre-trained weights of an

existing trained model with significantly

less number of images ( 5 to 10 percent

of actual images needed for training

ground up model) for the model to start

detecting As the pre-trained model has

already learnt some basic learning around

identifying edges curves and shapes in the

earlier layers it needs to learn only higher

order features specific to airplanes with

the existing computed weights In brief

Transfer Learning helps eliminate the need

to learn anything from scratch

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

Transfer Learning helps in saving

significant amount of data computational

power and time in training new models as

they leverage pre-trained weights from the

existing trained models and architectures

However it is important to understand

that Transfer Learning approach today

is only matured enough to be applied to

similar use cases that is you cannot use

the above discussed model to train a facial

recognition model

Another key thing during Transfer Learning

is that it is important to understand the

details of the data on which new use cases

are being trained as it can implicitly push

the built-in biases from the underlying data

into newer systems It is recommended

that the datasheets of underlying models

and data be studied thoroughly unless the

usage is for experimentative purpose

Earlier having used the human brain

rationale it is important to note that

human brains have gone through centuries

of experiences and gene evolution and has

the ability to learn faster whereas transfer

learning is just a few decades old and is

becoming ground for new vision and text

use cases

External Document copy 2019 Infosys Limited

Figure 90 Transfer Learning Layers Source John Cherrie Training Deep Learning Models with Transfer Learning

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

Humans have the impressive skill to reason

about new concepts and experiences with

just a single example They have the ability

for one-shot generalization the aptitude

to encounter a new concept understand

its structure and then generate compelling

alternative variations of the same

Facial recognition systems are good

candidates for Single Shot Learning

otherwise needing ten thousands of

individual face images to train one neural

network can be extremely costly time

consuming and infeasible However a

Single Shot LearningSingle Shot Learning based system using

existing pre-trained FaceNet model and

facial encoding based approach on top of

it can be very effective to establish face

similarity by computing distance between

the faces

In this approach 128 bit encoding of each

face image is generated and compared

with other imagersquos encoding to determine

if the person is same or different

Various distance based algorithms such

as Euclidean distance can be used to

determine if they are within specified

threshold The model training approach

involves creating pairs of (Anchor Positive)

and (Anchor Negative) and training the

model in a way where (Anchor Positive)

pair distance difference is smaller and

(Anchor Negative) distance is farther

ldquoAnchorrdquo is the image of a person for whom

the recognition model needs to be trained

ldquoPositiverdquo is another image of the same

person

ldquoNegativerdquo is image of a different person

External Document copy 2019 Infosys Limited

Figure 100 Encoding approach inspired from ML Course from Coursera

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

This is a specialized Machine Learning

discipline where an agent learns to behave

in an environment by getting reward or

punishment for the actions performed The

agent can have an objective to maximize

short term or long-term rewards This

discipline uses deep learning techniques to

Value Based

In value-based RL the goal is to optimize

the value function V(s) Qtable uses any

The agent will use this value function to

select which state to choose at each step

Policy Based

In policy-based RL we want to directly

optimize the policy function π(s) without

using a value function

The policy is what defines the agent

behavior at a given time

Deep Reinforcement Learning (RL)

Three Approaches to Reinforcement Learning

bring in human level performance on the

given task

Deep Reinforcement Learning has found

significant relevance and application in

various game design systems such as

creating video games chess alpha Go

Atari as well as in industrial applications of

mathematical function to arrive at a state

based on action

The value of each state is the total amount

There are two types of policies

1 Deterministic A policy which at a given

state will always return the same action

2 Stochastic A policy that outputs a

distribution probability over actions

Value based and Policy based are more

conventional Reinforcement Learning

approaches They are useful for modeling

relatively simple systems

robots driverless car etc

In reinforcement learning policy p

controls what action we should take Value

function v measures how good it is to be

in a particular state The value function

tells us the maximum expected future

reward the agent will get at each state

of the reward an agent can expect to

accumulate over the future starting at that

state

State

State

Q value

Q value action 2

Q value action 1

Q value action 3

Qtable

Deep Q Neuralnetwork

Q learning

Deep Q learning

Action

ExpectedReward discounted

Given that state

action = policy(state)

Figure 110 Schema inspired by the Q learning notebook by Udacity

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

Model Based

In model-based RL we model the

environment This means we create a

model of the behavior of the environment

then this model is used to arrive at results

that maximises short term or long-term

rewards The model equation can be

any equation that is defined based on

the environments behavior and must be

sufficiently generalized to counter new

situations

When Model based approach uses Deep

Neural Network algorithms to sufficiently

well generalize and learn the complexities

of the environment to produce optimal

results it is called Deep Reinforcement

Learning The challenge with model based

approach is each environment needs a

dedicated trained model

AlphaGo was trained by using data from

several games to beat the human being

in the game of Go The training accuracy

was just 57 and still it was sufficient to

beat the human level performance The

training methods involved reinforcement

learning and deep learning to build a

policy network that tells what moves are

promising and a value network that tells

how good the board position is Searches

for the final move from these networks

is done using Monte Carlo Tree Search

(MCTS) algorithm Using supervised

learning a policy network was created to

imitate the expert moves

Deep Mind released AlphaGo Zero in late

2017 which beat AlphaGo and did not

involve any training from previous games

data to train deep network The deep

network training was done by picking

the training samples from AlphaGo and

AlphaGo Zero playing games against

itself and selecting best moves to train

the network and then applying those

in real games to improve the results

iteratively This is possible because deep

reinforcement learning algorithms can

store long-range tree search results for the

next best move in memory and do very

large computations that are difficult for a

human brain

Designing machine learning solution

involves several steps such as collecting

data understanding cleansing and

normalizing data doing feature

engineering selecting or designing

the algorithm selecting the model

architecture selecting and tuning modelrsquos

hyper-parameters evaluating modelrsquos

performance deploying and monitoring

the machine learning system in an online

system and so on Such machine learning

solution design requires an expert Data

Scientist to complete the pipeline

Auto ML (AML)As the complexity of these and other tasks

can easily get overwhelming the rapid

growth of machine learning applications

has created a demand for off-the-shelf

machine learning methods that can be

used easily and without expert knowledge

The AI research area that encompasses

progressive automation of machine

learning pipeline tasks is called AutoML

(Automatic Machine Learning)

Google CEO Sundar Pichai wrote

ldquoDesigning neural nets is extremely time

intensive and requires an expertise that

limits its use to a smaller community of

scientists and engineers Thatrsquos why wersquove

created an approach called AutoML

showing that itrsquos possible for neural nets

to design neural netsrdquo while Googlersquos

Head of AI Jeff Dean suggested that 100x

computational power could replace the

need for machine learning expertise

AutoML Vision relies on two core

techniques transfer learning and neural

architecture search

Xtrain Ytrain

Xtest budget

Han

d-cr

afte

d po

rtfo

lio Meta Learning

AutoML system

Build Ensemble Ytest Data

ProcessorFeature

Preprocessor

Bayesian Optimization

Classier

ML Pipeline

Figure 120 An example of Auto sklearn pipeline Source Andreacute Biedenkapp We did it Again World Champions in AutoML

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

Here is a look at the few libraries that help

in implementing AutoML

AUTO-SKLEARN

AUTO SKLEARN automates several key

tasks in Machine Learning pipeline such

as addressing column missing values

encoding of categorical values data scaling

and normalization feature pre-processing

and selection of right algorithm with

hyper-parameters The pipeline supports

15 Classification and 14 Feature processing

algorithms Selection of right algorithm can

happen based on ensembling techniques

and applying meta knowledge gathered

from executing similar scenarios (datasets

and algorithms)

Usage

Auto-sklearn is written in python and can

be considered as replacement for scikit-

learn classifiers Here is a sample set of

commands

gtgtgt import autosklearnclassification

Implementing AutoML gtgtgt cls = autosklearnclassification

AutoSklearnClassifier()

gtgtgt clsfit(X_train y_train)

gtgtgt predictions = clspredict(X_test

y_test)

SMAC (Sequential Model-Based

Algorithm Configuration)

SMAC is a tool for automating certain

AutoML steps SMAC is useful for selection

of key features hyper-parameter

optimization and to speed up algorithmic

outputs

BOHB (Bayesian Optimization

Hyperband searches)

BOHB combines Bayesian hyper parameter

optimization with bandit methods for

faster convergence

Google H2O also have their respective

AutoML tools which are not covered here

but can be explored in specific cases

AutoML needs significant memory and

computational power to execute alternate

algorithms and compute results At

present GPU resources are extremely

costly to execute even simple Machine

Learning workloads such as CNN algorithm

to classify objects If multiple such

alternate algorithms should be executed

the computation dollar needed would be

exponential This is impractical infeasible

and inefficient for the current state of Data

Science industry Adoption of AutoML will

depend on two things one the maturity

of AutoML pipeline and second but more

important how quickly GPU clusters

become cheap The second being most

critical Selling Cloud GPU capacity could

be one of the motivation of several cloud

based infrastructure-running companies

to promote AutoML in the industry Also

AutoML will not replace the Data scientistrsquos

work but can provide augmentation

and speed to certain tasks such as data

standardization model tuning and

trying multiple algorithms It is only the

beginning for AutoML but this technique

has high relevance and usefulness for

solving ultra-complex problems

External Document copy 2019 Infosys Limited

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

Neural Architecture Search (NAS) is a

component of AutoML and addresses

the important step of designing Neural

Network Architecture

Designing fresh Neural Net architecture

involves an expert establishing and

organizing Neural Network layers filters

or channels filter sizes selecting other

optimum Hyper parameters and so on

through several rounds of computational

Neural Architecture Search (NAS)

iterations Since AlexNet deep neural

network architecture won the ImageNet

(image classification based on ImageNet

dataset) competition in 2012 several

architecture styles such as VGG ResNet

Inception Xception InceptionResNet

MobileNet and NASNet have significantly

evolved However selection of the right

architecture for the right problem is also

a skill due to the presence of various

influencers such as applicability to the

problem accuracy number of parameters

memory and computational footprint and

size of the architecture that govern the

overall functioning efficiency

Neural Architecture Search tries to address

this problem space by automatically

selecting right Neural Network architecture

to solve a given problem

AutoML

HyperparameterOptimization

NAS

External Document copy 2019 Infosys Limited

Figure 130 Source Liam Li Ameet Talwalkar What is neural architecture search

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

Search space The search space provides

boundary within which the specific

architecture needs to be searched

Computer Vision (captioning the scene

or product identification) based use

cases would need a different neural

network architecture style as against

Speech (speech transcription or speaker

classification) or unstructured Text (Topic

extraction intent mining) based use cases

Search space tries to provide available

catalogs of best in class architectures based

on other domain data and performance

Key Components of NAS

These are also usually hand crafted by

expert data scientists

Optimization method This is responsible

for providing mechanism to search the

best architecture It could be searched

and applied randomly or using certain

statistical or Machine Learning evaluation

approach such as Bayesian method or

reinforcement learning methods

Evaluation method This has the role

of evaluating the quality of architecture

considered by optimization method It

could be done using full training approach

or doing partial training and then applying

certain specialized methods such as partial

training or early stopping weights sharing

network morphism etc

For selective problem spaces as

compared to manual methods NAS have

outperformed and is showing definite

promise for future However it is still

evolving and not ready for production

usages as several architectures need to be

established and evaluated depending on

the problem space

Search Space

DAG Representation

Cell Block

Meta-Architecture

NAS Specic

Reinforcement Learning

Evolutionary Search

Gradient-Based Optimization

BayesianOptimization

Optimization Method

Components of NAS

Full Training

Partial Training

Weight-Sharing

Network Morphism

Hypernetworks

Evaluation Method

Figure 140 Components of NAS Source Liam Li Ameet Talwalkar What is neural architecture search

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

In this paper we looked at some key H3 AI

areas by no means this is an exhaustive

list Amongst all discussed Transfer

Learning Capsule Networks Explainable

AI Generative AI are making interesting

Addressing H3 AI Trends at Infosys

things possible and looks highly promising

We are keenly experimenting with these

building early use cases and integrating

into our product stack Infosys Enterprise

Cognitive platform (iECP) to solve

interesting client problems Here is a look

at how we are employing these H3 trends

in the work we do

Trend Use cases

1

2

3

4

5

6

7

8

9

10

Explainable AI (XAI)

Generative AI Neural Style Transfer (NST)

Fine Grained Classication

Capsule Networks

Meta Learning

Transfer Learning

Single Shot Learning

Deep Reinforcement Learning (RL)

Auto ML

Neural Architecture Search (NAS)

Applicable across where results need to be traced eg Tumor Detection Mortgage Rejection Candidate Selection etc

Art Generation Sketch Generation Image or Video Resolution Improvements Data GenerationAugmentation Music Generation

Vehicle Classication Type of Tumor Detection

Image Re-constructionImage ComparisonMatching

Intelligent Agents Continuous Learning scenarios for document review and corrections

Identifying person not wearing helmet Logobrand detection in the image Speech Model training for various accents vocabularies

Face Recognition Face Verication

Intelligent Agents Robots Driverless cars Trac Light Monitoring Continuous Learning scenarios for document review and corrections

Invoice Attribute Extraction Document Classication Document Clustering

CNN or RNN based use cases such as Image Classication Object Identication Image Segmentation Speaker Classication etc

External Document copy 2019 Infosys Limited

Table 20 AI Use cases Infosys Research

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

1 Explainable AI (XAI)

2 Fine Grained Classification

5 Transfer Learning

6 Single Shot Learning

3 Capsule Networks

4 Meta Learning

7 Deep Reinforcement Learning (RL)

8 Auto ML

bull httpschristophmgithubiointerpretable-ml-book

bull httpssimmachinescomexplainable-ai

bull httpswwwcmuedunewsstoriesarchives2018octoberexplainable-aihtml

bull httpsmediumcomQuantumBlackmaking-ai-human-again-the-importance-of-explainable-ai-xai-95d347ccbb1c

bull httpstowardsdatasciencecomexplainable-artificial-intelligence-part-2-model-interpretation-strategies-75d4afa6b739

bull httpsvisioncornelleduse3wp-contentuploads201502BMVC14pdf

bull httpswwwfastai20180723auto-ml-3

bull httpsarxivorgpdf160305106pdf

bull httpsarxivorgpdf171009829pdf

bull httpskerasioexamplescifar10_cnn_capsule

bull httpswwwyoutubecomwatchv=pPN8d0E3900

bull httpswwwyoutubecomwatchv=rTawFwUvnLE

bull httpsmediumfreecodecamporgunderstanding-capsule-networks-ais-alluring-new-architecture-bdb228173ddc

bull httpsmediumcomjrodthoughtswhats-new-in-deep-learning-research-openai-s-reptile-makes-it-easier-to-learn-how-to-learn-e0f6651a39f0

bull httpproceedingsmlrpressv48santoro16pdf

bull httpstowardsdatasciencecomwhats-new-in-deep-learning-research-understanding-meta-learning-91fef1295660

bull httpsdeepmindcomblogarticledeep-reinforcement-learning

bull httpsmediumfreecodecamporgan-introduction-to-reinforcement-learning-4339519de419

bull httpsmediumcomjonathan_huialphago-zero-a-game-changer-14ef6e45eba5

bull httpsarxivorgpdf181112560pdf

bull httpswwwml4aadorgautomated-algorithm-designalgorithm-configurationsmac

bull httpswwwfastai20180723auto-ml-3

bull httpswwwfastai20180716auto-ml2auto-ml

bull httpscompetitionscodalaborgcompetitions17767

bull httpswwwautomlorgautomlauto-sklearn

bull httpswwwml4aadorgautomated-algorithm-designalgorithm-configurationsmac

bull httpsautomlgithubioHpBandSterbuildhtmloptimizersbohbhtml

Reference

copy 2019 Infosys Limited Bengaluru India All Rights Reserved Infosys believes the information in this document is accurate as of its publication date such information is subject to change without notice Infosys acknowledges the proprietary rights of other companies to the trademarks product names and such other intellectual property rights mentioned in this document Except as expressly permitted neither this documentation nor any part of it may be reproduced stored in a retrieval system or transmitted in any form or by any means electronic mechanical printing photocopying recording or otherwise without the prior permission of Infosys Limited and or any named intellectual property rights holders under this document

For more information contact askusinfosyscom

Infosyscom | NYSE INFY Stay Connected

9 Neural Architecture Search (NAS)

10 Infosys Enterprise Cognitive Platform

bull httpswwworeillycomideaswhat-is-neural-architecture-search

bull httpswwwinfosyscomservicesincubating-emerging-technologiesofferingsPagesenterprise-cognitive-platformaspx

Sudhanshu Hate is inventor and architect of Infosys Enterprise Cognitive Platform (iECP)

a microservices API based Artificial Intelligence platform He has over 21 years of experience

in creating products solutions and working with clients on industry problems His current

areas of interests are Computer Vision Speech and Unstructured Text based AI possibilities

To know more about our work on the H3 trends in AI write to icetsinfosyscom

About the author

Page 8: H3 Trends in AI Algorithms: The Infosys Way...• Sentiment Analysis Figure 1.0: ... from use cases such as language translations, sentence formulation, text summarization, topic extraction

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

The below pictures and steps depict fine

grained classification approach for car

detection system

a Detects parts using a collection of

unsupervised part detectors

b Outputs a grid of discriminative

features (The CNN is learned with class

Car Detection System using Fine Grained Classification

labels and then truncated retaining

the first two convolutional layers

that retain spatial information) The

appearance of each part detected

using the learned CNN features is

described by pooling in the detected

region of each part

c Appearance of any undetected part

is set to zero This results in Ensemble

of Localized Learned Features (ELLF)

representation which is then used to

predict fine-grained object categories

d A standard CNN passes the output

of the convolutional layers through

several fully connected layers in order

to make a prediction

Convolutional Network are so far the

defacto and well accepted algorithms to

work with image based datasets They

work on the pixels of images using various

size filters (channels) by convolving using

pooling techniques to bubble the stronger

features to derive colors textures edges

and shapes and establish structures

through lower to highest layers

Given the face of a person CNN identifies

the face by establishing eyes ears

eyebrows lips chin etc components

of the face However if the facial image

is provided with incorrect position and

Capsule Network

alignment of eyes and eyebrows or say

eyebrows swaps with lips and ears are

placed on forehead the same CNN trained

algorithm would still go on and detect this

as a human face This is the huge drawback

of CNN algorithm and happens due to

its inability to store the information on

relative position of various objects

Capsule Network invented by Geoffery

Hinton addresses exactly this problem of

CNN by storing the spatial relationships of

various parts

Capsule Network like CNN are multi

layered neural networks consisting of

several capsules each capsule consists

of several neurons Capsules in lower

layers are called primary capsules and are

trained to detect an object (eg triangle

circle) within a given region of image It

outputs a vector that has two properties

Length and Orientation Length represents

the probability of the presence of the

object and Orientation represents the

pose parameters of the object such as

coordinates rotation angle etc

Capsules in higher layers called routing

capsules detect larger and more complex

objects such as eyes ears etc

Figure 60 Car Detection System Source Learning Features and Parts for Fine-Grained Recognition

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

Routing by Agreement

Advantage over CNN

Unlike CNN which primarily bubbles higher

order features using max or avg pooling

Capsule Network bubbles up features

using routing by agreement where every

capsule participates in choosing the shape

by voting (democratic election way)

In the figure given above

Lower level corresponds to rectangles

triangles and circles

High level corresponds to houses

boats and cars

If there is an image of a house the

capsules corresponding to rectangles

and triangles will have large activation

vectors Their relative positions (coded

in their instantiation parameters) will bet

on the presence of high-level objects

Since they will agree on the presence of

house the output vector of the house

capsule will become large This in turn

will make the predictions by the rectangle

and the triangle capsules larger This

cycle will repeat 4-5 times after which the

bets on the presence of a house will be

considerably larger than the bets on the

presence of a boat or a car

Less data for training - Capsule

Networks need very less data for

training (almost 10) as compared to

CNN

Fewer parameters The connections

between layers require fewer

parameters as capsule groups neurons

resulting in relatively less computations

bandwidth

Preserve pose and position - They

preserve pose and position information

as against CNN

High accuracy - Capsule Networks

have higher accuracy as compared to

CNNs

Reconstruction vs mere classification

- CNN helps you to classify the images

but not reconstruct the same image

whereas Capsule Networks help you to

reconstruct the exact image

Information retention vs loss - With

CNN during edge detection kernel

for edge detection works only on a

specific angle and each angle requires

a corresponding kernel When dealing

with edges CNN works well because

there are very few ways to describe

an edge Once we get up to the level

of shapes we do not want to have a

kernel for every angle of rectangles

ovals triangles and so on It would

get unwieldy and would become

even worse when dealing with more

complicated shapes that have 3

dimensional rotations and features like

lighting the reason why traditional

neural nets do not handle unseen

rotations effectively

Capsule Networks are best suited for

object detection and image segmentation

while it helps better model hierarchical

relationships and provides high accuracy

However Capsule Networks are still under

research and relatively new and mostly

tested and benchmarked on MNIST

dataset but they will be the future in

working with massive use cases emerging

from Vision datasets

Figure 70 A simple CapsNet with 3 layers This model gives comparable results to deep convolutional networks Source Dynamic

Routing Between Capsules Sara Sabour Nicholas Frosst Geoffrey E Hinton

Figure 80 Capsule Network for House or Boat classification Source Beginnersrsquo Guide to Capsule Networks

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

Traditional methods of learning in Machine

Learning focuses on taking a huge labeled

dataset and then learning to detect y

(dependent variable say classifying an

image as cat or dog) and given set of x

(independent variables images of cats

and dogs) This process involves selection

of an algorithm such as Convolution

Neural Net and arriving at various hyper

parameters such as number of layers

in the network number of neurons in

each layer learning rate weights bias

dropouts activation function to activate

the neuron such as sigmoid tanh and

Relu The learning happens through

several iterations of forward and backward

passes (propagation) by readjusting (also

called learning) the weights based on

difference in the loss (actual vs computed)

At the minimal loss the weights and

other network parameters are frozen

and are considered final model for future

prediction tasks This is obviously a long

and tedious process and repeating this for

every use case or task is engineering data

and compute intensive

Meta Learning focuses on how to learn to

learn It is one of the fascinating discipline

of artificial intelligence Human beings

have varying styles of learning Some

Humans can learn from their own existing

experiences or experiences they have

heard seen or observed Transfer Learning

discipline of AI is based on similar traits of

human learning where new models can

learn and benefit from existing trained

model

For example if a Computer Vision based

detection model with no Transfer

Learning that already detects various

types of vehicles such as cars trucks and

bicycles needs to be trained to detect an

airplane then you may have to retrain the

full model with images of all the previous

objects

Like the variety in human learning

techniques Meta Learning also uses

various learning methods based on

patterns of problems such as those based

on boundary space amount of data by

optimizing size of neural network or using

recurrent network approach Each of these

are briefly discussed inline

Few Shots Meta-Learning

This learning technique focuses on

learning from a few instances of data

Typically Neural Nets need millions of

data points to learn however Few Shots

Meta- Learning uses only a few instances

of data to build models Examples being

Facial recognition systems using Single

Shot Learning this is explained in detail in

Single Shot Learning section

Optimizer Meta-Learning

In this method the emphasis is on

optimizing the neural network and its

hyper- parameters A great example of

optimizer meta-learning are models that

are focused on improving gradient descent

techniques

Metric Meta-Learning

In this learning method the metric space

is narrowed down to improve the focus of

learning Then the learning is carried out

only in this metric space by leveraging

various optimization parameters that are

established for the given metric space

Recurrent Model Meta-Learning

This type of meta-learning model is tailored

to Recurrent Neural Networks(RNNs)

such as Long-Short-Term-Memory(LSTM)

In this architecture the meta-learner

algorithm will train a RNN model to process

a dataset sequentially and then process

new inputs from the task In an image

classification setting this might involve

passing in the set of (image label) pairs

of a dataset sequentially followed by new

examples which must be classified Meta-

Reinforcement Learning is an example of

this approach

Meta Learning

Transfer Learning (TL)

Types of Meta-Learning Models

people learn and memorize with one

instance of visual or auditory scan Some

people need multiple perspectives to

strengthen the neural connections for

permanent memory Some remember by

writing while some remember through

actual experiences Meta Learning tries

to leverage these to build its learning

characteristics

However with Transfer Learning you can

introduce an additional layer on top of the

existing pre-trained layer to start detecting

airplanes

Typically in a no Transfer Learning

scenario model needs to be trained and

during training right weights are arrived

at by doing many iterations (epochs) of

forward and back propagation which takes

significant amount of computation power

and time In addition Vision models need

significant amount of image data such as

in this example images of airplanes to be

trained

With Transfer Learning approach you can

reuse the existing pre-trained weights of an

existing trained model with significantly

less number of images ( 5 to 10 percent

of actual images needed for training

ground up model) for the model to start

detecting As the pre-trained model has

already learnt some basic learning around

identifying edges curves and shapes in the

earlier layers it needs to learn only higher

order features specific to airplanes with

the existing computed weights In brief

Transfer Learning helps eliminate the need

to learn anything from scratch

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

Transfer Learning helps in saving

significant amount of data computational

power and time in training new models as

they leverage pre-trained weights from the

existing trained models and architectures

However it is important to understand

that Transfer Learning approach today

is only matured enough to be applied to

similar use cases that is you cannot use

the above discussed model to train a facial

recognition model

Another key thing during Transfer Learning

is that it is important to understand the

details of the data on which new use cases

are being trained as it can implicitly push

the built-in biases from the underlying data

into newer systems It is recommended

that the datasheets of underlying models

and data be studied thoroughly unless the

usage is for experimentative purpose

Earlier having used the human brain

rationale it is important to note that

human brains have gone through centuries

of experiences and gene evolution and has

the ability to learn faster whereas transfer

learning is just a few decades old and is

becoming ground for new vision and text

use cases

External Document copy 2019 Infosys Limited

Figure 90 Transfer Learning Layers Source John Cherrie Training Deep Learning Models with Transfer Learning

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

Humans have the impressive skill to reason

about new concepts and experiences with

just a single example They have the ability

for one-shot generalization the aptitude

to encounter a new concept understand

its structure and then generate compelling

alternative variations of the same

Facial recognition systems are good

candidates for Single Shot Learning

otherwise needing ten thousands of

individual face images to train one neural

network can be extremely costly time

consuming and infeasible However a

Single Shot LearningSingle Shot Learning based system using

existing pre-trained FaceNet model and

facial encoding based approach on top of

it can be very effective to establish face

similarity by computing distance between

the faces

In this approach 128 bit encoding of each

face image is generated and compared

with other imagersquos encoding to determine

if the person is same or different

Various distance based algorithms such

as Euclidean distance can be used to

determine if they are within specified

threshold The model training approach

involves creating pairs of (Anchor Positive)

and (Anchor Negative) and training the

model in a way where (Anchor Positive)

pair distance difference is smaller and

(Anchor Negative) distance is farther

ldquoAnchorrdquo is the image of a person for whom

the recognition model needs to be trained

ldquoPositiverdquo is another image of the same

person

ldquoNegativerdquo is image of a different person

External Document copy 2019 Infosys Limited

Figure 100 Encoding approach inspired from ML Course from Coursera

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

This is a specialized Machine Learning

discipline where an agent learns to behave

in an environment by getting reward or

punishment for the actions performed The

agent can have an objective to maximize

short term or long-term rewards This

discipline uses deep learning techniques to

Value Based

In value-based RL the goal is to optimize

the value function V(s) Qtable uses any

The agent will use this value function to

select which state to choose at each step

Policy Based

In policy-based RL we want to directly

optimize the policy function π(s) without

using a value function

The policy is what defines the agent

behavior at a given time

Deep Reinforcement Learning (RL)

Three Approaches to Reinforcement Learning

bring in human level performance on the

given task

Deep Reinforcement Learning has found

significant relevance and application in

various game design systems such as

creating video games chess alpha Go

Atari as well as in industrial applications of

mathematical function to arrive at a state

based on action

The value of each state is the total amount

There are two types of policies

1 Deterministic A policy which at a given

state will always return the same action

2 Stochastic A policy that outputs a

distribution probability over actions

Value based and Policy based are more

conventional Reinforcement Learning

approaches They are useful for modeling

relatively simple systems

robots driverless car etc

In reinforcement learning policy p

controls what action we should take Value

function v measures how good it is to be

in a particular state The value function

tells us the maximum expected future

reward the agent will get at each state

of the reward an agent can expect to

accumulate over the future starting at that

state

State

State

Q value

Q value action 2

Q value action 1

Q value action 3

Qtable

Deep Q Neuralnetwork

Q learning

Deep Q learning

Action

ExpectedReward discounted

Given that state

action = policy(state)

Figure 110 Schema inspired by the Q learning notebook by Udacity

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

Model Based

In model-based RL we model the

environment This means we create a

model of the behavior of the environment

then this model is used to arrive at results

that maximises short term or long-term

rewards The model equation can be

any equation that is defined based on

the environments behavior and must be

sufficiently generalized to counter new

situations

When Model based approach uses Deep

Neural Network algorithms to sufficiently

well generalize and learn the complexities

of the environment to produce optimal

results it is called Deep Reinforcement

Learning The challenge with model based

approach is each environment needs a

dedicated trained model

AlphaGo was trained by using data from

several games to beat the human being

in the game of Go The training accuracy

was just 57 and still it was sufficient to

beat the human level performance The

training methods involved reinforcement

learning and deep learning to build a

policy network that tells what moves are

promising and a value network that tells

how good the board position is Searches

for the final move from these networks

is done using Monte Carlo Tree Search

(MCTS) algorithm Using supervised

learning a policy network was created to

imitate the expert moves

Deep Mind released AlphaGo Zero in late

2017 which beat AlphaGo and did not

involve any training from previous games

data to train deep network The deep

network training was done by picking

the training samples from AlphaGo and

AlphaGo Zero playing games against

itself and selecting best moves to train

the network and then applying those

in real games to improve the results

iteratively This is possible because deep

reinforcement learning algorithms can

store long-range tree search results for the

next best move in memory and do very

large computations that are difficult for a

human brain

Designing machine learning solution

involves several steps such as collecting

data understanding cleansing and

normalizing data doing feature

engineering selecting or designing

the algorithm selecting the model

architecture selecting and tuning modelrsquos

hyper-parameters evaluating modelrsquos

performance deploying and monitoring

the machine learning system in an online

system and so on Such machine learning

solution design requires an expert Data

Scientist to complete the pipeline

Auto ML (AML)As the complexity of these and other tasks

can easily get overwhelming the rapid

growth of machine learning applications

has created a demand for off-the-shelf

machine learning methods that can be

used easily and without expert knowledge

The AI research area that encompasses

progressive automation of machine

learning pipeline tasks is called AutoML

(Automatic Machine Learning)

Google CEO Sundar Pichai wrote

ldquoDesigning neural nets is extremely time

intensive and requires an expertise that

limits its use to a smaller community of

scientists and engineers Thatrsquos why wersquove

created an approach called AutoML

showing that itrsquos possible for neural nets

to design neural netsrdquo while Googlersquos

Head of AI Jeff Dean suggested that 100x

computational power could replace the

need for machine learning expertise

AutoML Vision relies on two core

techniques transfer learning and neural

architecture search

Xtrain Ytrain

Xtest budget

Han

d-cr

afte

d po

rtfo

lio Meta Learning

AutoML system

Build Ensemble Ytest Data

ProcessorFeature

Preprocessor

Bayesian Optimization

Classier

ML Pipeline

Figure 120 An example of Auto sklearn pipeline Source Andreacute Biedenkapp We did it Again World Champions in AutoML

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

Here is a look at the few libraries that help

in implementing AutoML

AUTO-SKLEARN

AUTO SKLEARN automates several key

tasks in Machine Learning pipeline such

as addressing column missing values

encoding of categorical values data scaling

and normalization feature pre-processing

and selection of right algorithm with

hyper-parameters The pipeline supports

15 Classification and 14 Feature processing

algorithms Selection of right algorithm can

happen based on ensembling techniques

and applying meta knowledge gathered

from executing similar scenarios (datasets

and algorithms)

Usage

Auto-sklearn is written in python and can

be considered as replacement for scikit-

learn classifiers Here is a sample set of

commands

gtgtgt import autosklearnclassification

Implementing AutoML gtgtgt cls = autosklearnclassification

AutoSklearnClassifier()

gtgtgt clsfit(X_train y_train)

gtgtgt predictions = clspredict(X_test

y_test)

SMAC (Sequential Model-Based

Algorithm Configuration)

SMAC is a tool for automating certain

AutoML steps SMAC is useful for selection

of key features hyper-parameter

optimization and to speed up algorithmic

outputs

BOHB (Bayesian Optimization

Hyperband searches)

BOHB combines Bayesian hyper parameter

optimization with bandit methods for

faster convergence

Google H2O also have their respective

AutoML tools which are not covered here

but can be explored in specific cases

AutoML needs significant memory and

computational power to execute alternate

algorithms and compute results At

present GPU resources are extremely

costly to execute even simple Machine

Learning workloads such as CNN algorithm

to classify objects If multiple such

alternate algorithms should be executed

the computation dollar needed would be

exponential This is impractical infeasible

and inefficient for the current state of Data

Science industry Adoption of AutoML will

depend on two things one the maturity

of AutoML pipeline and second but more

important how quickly GPU clusters

become cheap The second being most

critical Selling Cloud GPU capacity could

be one of the motivation of several cloud

based infrastructure-running companies

to promote AutoML in the industry Also

AutoML will not replace the Data scientistrsquos

work but can provide augmentation

and speed to certain tasks such as data

standardization model tuning and

trying multiple algorithms It is only the

beginning for AutoML but this technique

has high relevance and usefulness for

solving ultra-complex problems

External Document copy 2019 Infosys Limited

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

Neural Architecture Search (NAS) is a

component of AutoML and addresses

the important step of designing Neural

Network Architecture

Designing fresh Neural Net architecture

involves an expert establishing and

organizing Neural Network layers filters

or channels filter sizes selecting other

optimum Hyper parameters and so on

through several rounds of computational

Neural Architecture Search (NAS)

iterations Since AlexNet deep neural

network architecture won the ImageNet

(image classification based on ImageNet

dataset) competition in 2012 several

architecture styles such as VGG ResNet

Inception Xception InceptionResNet

MobileNet and NASNet have significantly

evolved However selection of the right

architecture for the right problem is also

a skill due to the presence of various

influencers such as applicability to the

problem accuracy number of parameters

memory and computational footprint and

size of the architecture that govern the

overall functioning efficiency

Neural Architecture Search tries to address

this problem space by automatically

selecting right Neural Network architecture

to solve a given problem

AutoML

HyperparameterOptimization

NAS

External Document copy 2019 Infosys Limited

Figure 130 Source Liam Li Ameet Talwalkar What is neural architecture search

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

Search space The search space provides

boundary within which the specific

architecture needs to be searched

Computer Vision (captioning the scene

or product identification) based use

cases would need a different neural

network architecture style as against

Speech (speech transcription or speaker

classification) or unstructured Text (Topic

extraction intent mining) based use cases

Search space tries to provide available

catalogs of best in class architectures based

on other domain data and performance

Key Components of NAS

These are also usually hand crafted by

expert data scientists

Optimization method This is responsible

for providing mechanism to search the

best architecture It could be searched

and applied randomly or using certain

statistical or Machine Learning evaluation

approach such as Bayesian method or

reinforcement learning methods

Evaluation method This has the role

of evaluating the quality of architecture

considered by optimization method It

could be done using full training approach

or doing partial training and then applying

certain specialized methods such as partial

training or early stopping weights sharing

network morphism etc

For selective problem spaces as

compared to manual methods NAS have

outperformed and is showing definite

promise for future However it is still

evolving and not ready for production

usages as several architectures need to be

established and evaluated depending on

the problem space

Search Space

DAG Representation

Cell Block

Meta-Architecture

NAS Specic

Reinforcement Learning

Evolutionary Search

Gradient-Based Optimization

BayesianOptimization

Optimization Method

Components of NAS

Full Training

Partial Training

Weight-Sharing

Network Morphism

Hypernetworks

Evaluation Method

Figure 140 Components of NAS Source Liam Li Ameet Talwalkar What is neural architecture search

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

In this paper we looked at some key H3 AI

areas by no means this is an exhaustive

list Amongst all discussed Transfer

Learning Capsule Networks Explainable

AI Generative AI are making interesting

Addressing H3 AI Trends at Infosys

things possible and looks highly promising

We are keenly experimenting with these

building early use cases and integrating

into our product stack Infosys Enterprise

Cognitive platform (iECP) to solve

interesting client problems Here is a look

at how we are employing these H3 trends

in the work we do

Trend Use cases

1

2

3

4

5

6

7

8

9

10

Explainable AI (XAI)

Generative AI Neural Style Transfer (NST)

Fine Grained Classication

Capsule Networks

Meta Learning

Transfer Learning

Single Shot Learning

Deep Reinforcement Learning (RL)

Auto ML

Neural Architecture Search (NAS)

Applicable across where results need to be traced eg Tumor Detection Mortgage Rejection Candidate Selection etc

Art Generation Sketch Generation Image or Video Resolution Improvements Data GenerationAugmentation Music Generation

Vehicle Classication Type of Tumor Detection

Image Re-constructionImage ComparisonMatching

Intelligent Agents Continuous Learning scenarios for document review and corrections

Identifying person not wearing helmet Logobrand detection in the image Speech Model training for various accents vocabularies

Face Recognition Face Verication

Intelligent Agents Robots Driverless cars Trac Light Monitoring Continuous Learning scenarios for document review and corrections

Invoice Attribute Extraction Document Classication Document Clustering

CNN or RNN based use cases such as Image Classication Object Identication Image Segmentation Speaker Classication etc

External Document copy 2019 Infosys Limited

Table 20 AI Use cases Infosys Research

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

1 Explainable AI (XAI)

2 Fine Grained Classification

5 Transfer Learning

6 Single Shot Learning

3 Capsule Networks

4 Meta Learning

7 Deep Reinforcement Learning (RL)

8 Auto ML

bull httpschristophmgithubiointerpretable-ml-book

bull httpssimmachinescomexplainable-ai

bull httpswwwcmuedunewsstoriesarchives2018octoberexplainable-aihtml

bull httpsmediumcomQuantumBlackmaking-ai-human-again-the-importance-of-explainable-ai-xai-95d347ccbb1c

bull httpstowardsdatasciencecomexplainable-artificial-intelligence-part-2-model-interpretation-strategies-75d4afa6b739

bull httpsvisioncornelleduse3wp-contentuploads201502BMVC14pdf

bull httpswwwfastai20180723auto-ml-3

bull httpsarxivorgpdf160305106pdf

bull httpsarxivorgpdf171009829pdf

bull httpskerasioexamplescifar10_cnn_capsule

bull httpswwwyoutubecomwatchv=pPN8d0E3900

bull httpswwwyoutubecomwatchv=rTawFwUvnLE

bull httpsmediumfreecodecamporgunderstanding-capsule-networks-ais-alluring-new-architecture-bdb228173ddc

bull httpsmediumcomjrodthoughtswhats-new-in-deep-learning-research-openai-s-reptile-makes-it-easier-to-learn-how-to-learn-e0f6651a39f0

bull httpproceedingsmlrpressv48santoro16pdf

bull httpstowardsdatasciencecomwhats-new-in-deep-learning-research-understanding-meta-learning-91fef1295660

bull httpsdeepmindcomblogarticledeep-reinforcement-learning

bull httpsmediumfreecodecamporgan-introduction-to-reinforcement-learning-4339519de419

bull httpsmediumcomjonathan_huialphago-zero-a-game-changer-14ef6e45eba5

bull httpsarxivorgpdf181112560pdf

bull httpswwwml4aadorgautomated-algorithm-designalgorithm-configurationsmac

bull httpswwwfastai20180723auto-ml-3

bull httpswwwfastai20180716auto-ml2auto-ml

bull httpscompetitionscodalaborgcompetitions17767

bull httpswwwautomlorgautomlauto-sklearn

bull httpswwwml4aadorgautomated-algorithm-designalgorithm-configurationsmac

bull httpsautomlgithubioHpBandSterbuildhtmloptimizersbohbhtml

Reference

copy 2019 Infosys Limited Bengaluru India All Rights Reserved Infosys believes the information in this document is accurate as of its publication date such information is subject to change without notice Infosys acknowledges the proprietary rights of other companies to the trademarks product names and such other intellectual property rights mentioned in this document Except as expressly permitted neither this documentation nor any part of it may be reproduced stored in a retrieval system or transmitted in any form or by any means electronic mechanical printing photocopying recording or otherwise without the prior permission of Infosys Limited and or any named intellectual property rights holders under this document

For more information contact askusinfosyscom

Infosyscom | NYSE INFY Stay Connected

9 Neural Architecture Search (NAS)

10 Infosys Enterprise Cognitive Platform

bull httpswwworeillycomideaswhat-is-neural-architecture-search

bull httpswwwinfosyscomservicesincubating-emerging-technologiesofferingsPagesenterprise-cognitive-platformaspx

Sudhanshu Hate is inventor and architect of Infosys Enterprise Cognitive Platform (iECP)

a microservices API based Artificial Intelligence platform He has over 21 years of experience

in creating products solutions and working with clients on industry problems His current

areas of interests are Computer Vision Speech and Unstructured Text based AI possibilities

To know more about our work on the H3 trends in AI write to icetsinfosyscom

About the author

Page 9: H3 Trends in AI Algorithms: The Infosys Way...• Sentiment Analysis Figure 1.0: ... from use cases such as language translations, sentence formulation, text summarization, topic extraction

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

Routing by Agreement

Advantage over CNN

Unlike CNN which primarily bubbles higher

order features using max or avg pooling

Capsule Network bubbles up features

using routing by agreement where every

capsule participates in choosing the shape

by voting (democratic election way)

In the figure given above

Lower level corresponds to rectangles

triangles and circles

High level corresponds to houses

boats and cars

If there is an image of a house the

capsules corresponding to rectangles

and triangles will have large activation

vectors Their relative positions (coded

in their instantiation parameters) will bet

on the presence of high-level objects

Since they will agree on the presence of

house the output vector of the house

capsule will become large This in turn

will make the predictions by the rectangle

and the triangle capsules larger This

cycle will repeat 4-5 times after which the

bets on the presence of a house will be

considerably larger than the bets on the

presence of a boat or a car

Less data for training - Capsule

Networks need very less data for

training (almost 10) as compared to

CNN

Fewer parameters The connections

between layers require fewer

parameters as capsule groups neurons

resulting in relatively less computations

bandwidth

Preserve pose and position - They

preserve pose and position information

as against CNN

High accuracy - Capsule Networks

have higher accuracy as compared to

CNNs

Reconstruction vs mere classification

- CNN helps you to classify the images

but not reconstruct the same image

whereas Capsule Networks help you to

reconstruct the exact image

Information retention vs loss - With

CNN during edge detection kernel

for edge detection works only on a

specific angle and each angle requires

a corresponding kernel When dealing

with edges CNN works well because

there are very few ways to describe

an edge Once we get up to the level

of shapes we do not want to have a

kernel for every angle of rectangles

ovals triangles and so on It would

get unwieldy and would become

even worse when dealing with more

complicated shapes that have 3

dimensional rotations and features like

lighting the reason why traditional

neural nets do not handle unseen

rotations effectively

Capsule Networks are best suited for

object detection and image segmentation

while it helps better model hierarchical

relationships and provides high accuracy

However Capsule Networks are still under

research and relatively new and mostly

tested and benchmarked on MNIST

dataset but they will be the future in

working with massive use cases emerging

from Vision datasets

Figure 70 A simple CapsNet with 3 layers This model gives comparable results to deep convolutional networks Source Dynamic

Routing Between Capsules Sara Sabour Nicholas Frosst Geoffrey E Hinton

Figure 80 Capsule Network for House or Boat classification Source Beginnersrsquo Guide to Capsule Networks

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

Traditional methods of learning in Machine

Learning focuses on taking a huge labeled

dataset and then learning to detect y

(dependent variable say classifying an

image as cat or dog) and given set of x

(independent variables images of cats

and dogs) This process involves selection

of an algorithm such as Convolution

Neural Net and arriving at various hyper

parameters such as number of layers

in the network number of neurons in

each layer learning rate weights bias

dropouts activation function to activate

the neuron such as sigmoid tanh and

Relu The learning happens through

several iterations of forward and backward

passes (propagation) by readjusting (also

called learning) the weights based on

difference in the loss (actual vs computed)

At the minimal loss the weights and

other network parameters are frozen

and are considered final model for future

prediction tasks This is obviously a long

and tedious process and repeating this for

every use case or task is engineering data

and compute intensive

Meta Learning focuses on how to learn to

learn It is one of the fascinating discipline

of artificial intelligence Human beings

have varying styles of learning Some

Humans can learn from their own existing

experiences or experiences they have

heard seen or observed Transfer Learning

discipline of AI is based on similar traits of

human learning where new models can

learn and benefit from existing trained

model

For example if a Computer Vision based

detection model with no Transfer

Learning that already detects various

types of vehicles such as cars trucks and

bicycles needs to be trained to detect an

airplane then you may have to retrain the

full model with images of all the previous

objects

Like the variety in human learning

techniques Meta Learning also uses

various learning methods based on

patterns of problems such as those based

on boundary space amount of data by

optimizing size of neural network or using

recurrent network approach Each of these

are briefly discussed inline

Few Shots Meta-Learning

This learning technique focuses on

learning from a few instances of data

Typically Neural Nets need millions of

data points to learn however Few Shots

Meta- Learning uses only a few instances

of data to build models Examples being

Facial recognition systems using Single

Shot Learning this is explained in detail in

Single Shot Learning section

Optimizer Meta-Learning

In this method the emphasis is on

optimizing the neural network and its

hyper- parameters A great example of

optimizer meta-learning are models that

are focused on improving gradient descent

techniques

Metric Meta-Learning

In this learning method the metric space

is narrowed down to improve the focus of

learning Then the learning is carried out

only in this metric space by leveraging

various optimization parameters that are

established for the given metric space

Recurrent Model Meta-Learning

This type of meta-learning model is tailored

to Recurrent Neural Networks(RNNs)

such as Long-Short-Term-Memory(LSTM)

In this architecture the meta-learner

algorithm will train a RNN model to process

a dataset sequentially and then process

new inputs from the task In an image

classification setting this might involve

passing in the set of (image label) pairs

of a dataset sequentially followed by new

examples which must be classified Meta-

Reinforcement Learning is an example of

this approach

Meta Learning

Transfer Learning (TL)

Types of Meta-Learning Models

people learn and memorize with one

instance of visual or auditory scan Some

people need multiple perspectives to

strengthen the neural connections for

permanent memory Some remember by

writing while some remember through

actual experiences Meta Learning tries

to leverage these to build its learning

characteristics

However with Transfer Learning you can

introduce an additional layer on top of the

existing pre-trained layer to start detecting

airplanes

Typically in a no Transfer Learning

scenario model needs to be trained and

during training right weights are arrived

at by doing many iterations (epochs) of

forward and back propagation which takes

significant amount of computation power

and time In addition Vision models need

significant amount of image data such as

in this example images of airplanes to be

trained

With Transfer Learning approach you can

reuse the existing pre-trained weights of an

existing trained model with significantly

less number of images ( 5 to 10 percent

of actual images needed for training

ground up model) for the model to start

detecting As the pre-trained model has

already learnt some basic learning around

identifying edges curves and shapes in the

earlier layers it needs to learn only higher

order features specific to airplanes with

the existing computed weights In brief

Transfer Learning helps eliminate the need

to learn anything from scratch

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

Transfer Learning helps in saving

significant amount of data computational

power and time in training new models as

they leverage pre-trained weights from the

existing trained models and architectures

However it is important to understand

that Transfer Learning approach today

is only matured enough to be applied to

similar use cases that is you cannot use

the above discussed model to train a facial

recognition model

Another key thing during Transfer Learning

is that it is important to understand the

details of the data on which new use cases

are being trained as it can implicitly push

the built-in biases from the underlying data

into newer systems It is recommended

that the datasheets of underlying models

and data be studied thoroughly unless the

usage is for experimentative purpose

Earlier having used the human brain

rationale it is important to note that

human brains have gone through centuries

of experiences and gene evolution and has

the ability to learn faster whereas transfer

learning is just a few decades old and is

becoming ground for new vision and text

use cases

External Document copy 2019 Infosys Limited

Figure 90 Transfer Learning Layers Source John Cherrie Training Deep Learning Models with Transfer Learning

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

Humans have the impressive skill to reason

about new concepts and experiences with

just a single example They have the ability

for one-shot generalization the aptitude

to encounter a new concept understand

its structure and then generate compelling

alternative variations of the same

Facial recognition systems are good

candidates for Single Shot Learning

otherwise needing ten thousands of

individual face images to train one neural

network can be extremely costly time

consuming and infeasible However a

Single Shot LearningSingle Shot Learning based system using

existing pre-trained FaceNet model and

facial encoding based approach on top of

it can be very effective to establish face

similarity by computing distance between

the faces

In this approach 128 bit encoding of each

face image is generated and compared

with other imagersquos encoding to determine

if the person is same or different

Various distance based algorithms such

as Euclidean distance can be used to

determine if they are within specified

threshold The model training approach

involves creating pairs of (Anchor Positive)

and (Anchor Negative) and training the

model in a way where (Anchor Positive)

pair distance difference is smaller and

(Anchor Negative) distance is farther

ldquoAnchorrdquo is the image of a person for whom

the recognition model needs to be trained

ldquoPositiverdquo is another image of the same

person

ldquoNegativerdquo is image of a different person

External Document copy 2019 Infosys Limited

Figure 100 Encoding approach inspired from ML Course from Coursera

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

This is a specialized Machine Learning

discipline where an agent learns to behave

in an environment by getting reward or

punishment for the actions performed The

agent can have an objective to maximize

short term or long-term rewards This

discipline uses deep learning techniques to

Value Based

In value-based RL the goal is to optimize

the value function V(s) Qtable uses any

The agent will use this value function to

select which state to choose at each step

Policy Based

In policy-based RL we want to directly

optimize the policy function π(s) without

using a value function

The policy is what defines the agent

behavior at a given time

Deep Reinforcement Learning (RL)

Three Approaches to Reinforcement Learning

bring in human level performance on the

given task

Deep Reinforcement Learning has found

significant relevance and application in

various game design systems such as

creating video games chess alpha Go

Atari as well as in industrial applications of

mathematical function to arrive at a state

based on action

The value of each state is the total amount

There are two types of policies

1 Deterministic A policy which at a given

state will always return the same action

2 Stochastic A policy that outputs a

distribution probability over actions

Value based and Policy based are more

conventional Reinforcement Learning

approaches They are useful for modeling

relatively simple systems

robots driverless car etc

In reinforcement learning policy p

controls what action we should take Value

function v measures how good it is to be

in a particular state The value function

tells us the maximum expected future

reward the agent will get at each state

of the reward an agent can expect to

accumulate over the future starting at that

state

State

State

Q value

Q value action 2

Q value action 1

Q value action 3

Qtable

Deep Q Neuralnetwork

Q learning

Deep Q learning

Action

ExpectedReward discounted

Given that state

action = policy(state)

Figure 110 Schema inspired by the Q learning notebook by Udacity

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

Model Based

In model-based RL we model the

environment This means we create a

model of the behavior of the environment

then this model is used to arrive at results

that maximises short term or long-term

rewards The model equation can be

any equation that is defined based on

the environments behavior and must be

sufficiently generalized to counter new

situations

When Model based approach uses Deep

Neural Network algorithms to sufficiently

well generalize and learn the complexities

of the environment to produce optimal

results it is called Deep Reinforcement

Learning The challenge with model based

approach is each environment needs a

dedicated trained model

AlphaGo was trained by using data from

several games to beat the human being

in the game of Go The training accuracy

was just 57 and still it was sufficient to

beat the human level performance The

training methods involved reinforcement

learning and deep learning to build a

policy network that tells what moves are

promising and a value network that tells

how good the board position is Searches

for the final move from these networks

is done using Monte Carlo Tree Search

(MCTS) algorithm Using supervised

learning a policy network was created to

imitate the expert moves

Deep Mind released AlphaGo Zero in late

2017 which beat AlphaGo and did not

involve any training from previous games

data to train deep network The deep

network training was done by picking

the training samples from AlphaGo and

AlphaGo Zero playing games against

itself and selecting best moves to train

the network and then applying those

in real games to improve the results

iteratively This is possible because deep

reinforcement learning algorithms can

store long-range tree search results for the

next best move in memory and do very

large computations that are difficult for a

human brain

Designing machine learning solution

involves several steps such as collecting

data understanding cleansing and

normalizing data doing feature

engineering selecting or designing

the algorithm selecting the model

architecture selecting and tuning modelrsquos

hyper-parameters evaluating modelrsquos

performance deploying and monitoring

the machine learning system in an online

system and so on Such machine learning

solution design requires an expert Data

Scientist to complete the pipeline

Auto ML (AML)As the complexity of these and other tasks

can easily get overwhelming the rapid

growth of machine learning applications

has created a demand for off-the-shelf

machine learning methods that can be

used easily and without expert knowledge

The AI research area that encompasses

progressive automation of machine

learning pipeline tasks is called AutoML

(Automatic Machine Learning)

Google CEO Sundar Pichai wrote

ldquoDesigning neural nets is extremely time

intensive and requires an expertise that

limits its use to a smaller community of

scientists and engineers Thatrsquos why wersquove

created an approach called AutoML

showing that itrsquos possible for neural nets

to design neural netsrdquo while Googlersquos

Head of AI Jeff Dean suggested that 100x

computational power could replace the

need for machine learning expertise

AutoML Vision relies on two core

techniques transfer learning and neural

architecture search

Xtrain Ytrain

Xtest budget

Han

d-cr

afte

d po

rtfo

lio Meta Learning

AutoML system

Build Ensemble Ytest Data

ProcessorFeature

Preprocessor

Bayesian Optimization

Classier

ML Pipeline

Figure 120 An example of Auto sklearn pipeline Source Andreacute Biedenkapp We did it Again World Champions in AutoML

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

Here is a look at the few libraries that help

in implementing AutoML

AUTO-SKLEARN

AUTO SKLEARN automates several key

tasks in Machine Learning pipeline such

as addressing column missing values

encoding of categorical values data scaling

and normalization feature pre-processing

and selection of right algorithm with

hyper-parameters The pipeline supports

15 Classification and 14 Feature processing

algorithms Selection of right algorithm can

happen based on ensembling techniques

and applying meta knowledge gathered

from executing similar scenarios (datasets

and algorithms)

Usage

Auto-sklearn is written in python and can

be considered as replacement for scikit-

learn classifiers Here is a sample set of

commands

gtgtgt import autosklearnclassification

Implementing AutoML gtgtgt cls = autosklearnclassification

AutoSklearnClassifier()

gtgtgt clsfit(X_train y_train)

gtgtgt predictions = clspredict(X_test

y_test)

SMAC (Sequential Model-Based

Algorithm Configuration)

SMAC is a tool for automating certain

AutoML steps SMAC is useful for selection

of key features hyper-parameter

optimization and to speed up algorithmic

outputs

BOHB (Bayesian Optimization

Hyperband searches)

BOHB combines Bayesian hyper parameter

optimization with bandit methods for

faster convergence

Google H2O also have their respective

AutoML tools which are not covered here

but can be explored in specific cases

AutoML needs significant memory and

computational power to execute alternate

algorithms and compute results At

present GPU resources are extremely

costly to execute even simple Machine

Learning workloads such as CNN algorithm

to classify objects If multiple such

alternate algorithms should be executed

the computation dollar needed would be

exponential This is impractical infeasible

and inefficient for the current state of Data

Science industry Adoption of AutoML will

depend on two things one the maturity

of AutoML pipeline and second but more

important how quickly GPU clusters

become cheap The second being most

critical Selling Cloud GPU capacity could

be one of the motivation of several cloud

based infrastructure-running companies

to promote AutoML in the industry Also

AutoML will not replace the Data scientistrsquos

work but can provide augmentation

and speed to certain tasks such as data

standardization model tuning and

trying multiple algorithms It is only the

beginning for AutoML but this technique

has high relevance and usefulness for

solving ultra-complex problems

External Document copy 2019 Infosys Limited

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

Neural Architecture Search (NAS) is a

component of AutoML and addresses

the important step of designing Neural

Network Architecture

Designing fresh Neural Net architecture

involves an expert establishing and

organizing Neural Network layers filters

or channels filter sizes selecting other

optimum Hyper parameters and so on

through several rounds of computational

Neural Architecture Search (NAS)

iterations Since AlexNet deep neural

network architecture won the ImageNet

(image classification based on ImageNet

dataset) competition in 2012 several

architecture styles such as VGG ResNet

Inception Xception InceptionResNet

MobileNet and NASNet have significantly

evolved However selection of the right

architecture for the right problem is also

a skill due to the presence of various

influencers such as applicability to the

problem accuracy number of parameters

memory and computational footprint and

size of the architecture that govern the

overall functioning efficiency

Neural Architecture Search tries to address

this problem space by automatically

selecting right Neural Network architecture

to solve a given problem

AutoML

HyperparameterOptimization

NAS

External Document copy 2019 Infosys Limited

Figure 130 Source Liam Li Ameet Talwalkar What is neural architecture search

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

Search space The search space provides

boundary within which the specific

architecture needs to be searched

Computer Vision (captioning the scene

or product identification) based use

cases would need a different neural

network architecture style as against

Speech (speech transcription or speaker

classification) or unstructured Text (Topic

extraction intent mining) based use cases

Search space tries to provide available

catalogs of best in class architectures based

on other domain data and performance

Key Components of NAS

These are also usually hand crafted by

expert data scientists

Optimization method This is responsible

for providing mechanism to search the

best architecture It could be searched

and applied randomly or using certain

statistical or Machine Learning evaluation

approach such as Bayesian method or

reinforcement learning methods

Evaluation method This has the role

of evaluating the quality of architecture

considered by optimization method It

could be done using full training approach

or doing partial training and then applying

certain specialized methods such as partial

training or early stopping weights sharing

network morphism etc

For selective problem spaces as

compared to manual methods NAS have

outperformed and is showing definite

promise for future However it is still

evolving and not ready for production

usages as several architectures need to be

established and evaluated depending on

the problem space

Search Space

DAG Representation

Cell Block

Meta-Architecture

NAS Specic

Reinforcement Learning

Evolutionary Search

Gradient-Based Optimization

BayesianOptimization

Optimization Method

Components of NAS

Full Training

Partial Training

Weight-Sharing

Network Morphism

Hypernetworks

Evaluation Method

Figure 140 Components of NAS Source Liam Li Ameet Talwalkar What is neural architecture search

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

In this paper we looked at some key H3 AI

areas by no means this is an exhaustive

list Amongst all discussed Transfer

Learning Capsule Networks Explainable

AI Generative AI are making interesting

Addressing H3 AI Trends at Infosys

things possible and looks highly promising

We are keenly experimenting with these

building early use cases and integrating

into our product stack Infosys Enterprise

Cognitive platform (iECP) to solve

interesting client problems Here is a look

at how we are employing these H3 trends

in the work we do

Trend Use cases

1

2

3

4

5

6

7

8

9

10

Explainable AI (XAI)

Generative AI Neural Style Transfer (NST)

Fine Grained Classication

Capsule Networks

Meta Learning

Transfer Learning

Single Shot Learning

Deep Reinforcement Learning (RL)

Auto ML

Neural Architecture Search (NAS)

Applicable across where results need to be traced eg Tumor Detection Mortgage Rejection Candidate Selection etc

Art Generation Sketch Generation Image or Video Resolution Improvements Data GenerationAugmentation Music Generation

Vehicle Classication Type of Tumor Detection

Image Re-constructionImage ComparisonMatching

Intelligent Agents Continuous Learning scenarios for document review and corrections

Identifying person not wearing helmet Logobrand detection in the image Speech Model training for various accents vocabularies

Face Recognition Face Verication

Intelligent Agents Robots Driverless cars Trac Light Monitoring Continuous Learning scenarios for document review and corrections

Invoice Attribute Extraction Document Classication Document Clustering

CNN or RNN based use cases such as Image Classication Object Identication Image Segmentation Speaker Classication etc

External Document copy 2019 Infosys Limited

Table 20 AI Use cases Infosys Research

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

1 Explainable AI (XAI)

2 Fine Grained Classification

5 Transfer Learning

6 Single Shot Learning

3 Capsule Networks

4 Meta Learning

7 Deep Reinforcement Learning (RL)

8 Auto ML

bull httpschristophmgithubiointerpretable-ml-book

bull httpssimmachinescomexplainable-ai

bull httpswwwcmuedunewsstoriesarchives2018octoberexplainable-aihtml

bull httpsmediumcomQuantumBlackmaking-ai-human-again-the-importance-of-explainable-ai-xai-95d347ccbb1c

bull httpstowardsdatasciencecomexplainable-artificial-intelligence-part-2-model-interpretation-strategies-75d4afa6b739

bull httpsvisioncornelleduse3wp-contentuploads201502BMVC14pdf

bull httpswwwfastai20180723auto-ml-3

bull httpsarxivorgpdf160305106pdf

bull httpsarxivorgpdf171009829pdf

bull httpskerasioexamplescifar10_cnn_capsule

bull httpswwwyoutubecomwatchv=pPN8d0E3900

bull httpswwwyoutubecomwatchv=rTawFwUvnLE

bull httpsmediumfreecodecamporgunderstanding-capsule-networks-ais-alluring-new-architecture-bdb228173ddc

bull httpsmediumcomjrodthoughtswhats-new-in-deep-learning-research-openai-s-reptile-makes-it-easier-to-learn-how-to-learn-e0f6651a39f0

bull httpproceedingsmlrpressv48santoro16pdf

bull httpstowardsdatasciencecomwhats-new-in-deep-learning-research-understanding-meta-learning-91fef1295660

bull httpsdeepmindcomblogarticledeep-reinforcement-learning

bull httpsmediumfreecodecamporgan-introduction-to-reinforcement-learning-4339519de419

bull httpsmediumcomjonathan_huialphago-zero-a-game-changer-14ef6e45eba5

bull httpsarxivorgpdf181112560pdf

bull httpswwwml4aadorgautomated-algorithm-designalgorithm-configurationsmac

bull httpswwwfastai20180723auto-ml-3

bull httpswwwfastai20180716auto-ml2auto-ml

bull httpscompetitionscodalaborgcompetitions17767

bull httpswwwautomlorgautomlauto-sklearn

bull httpswwwml4aadorgautomated-algorithm-designalgorithm-configurationsmac

bull httpsautomlgithubioHpBandSterbuildhtmloptimizersbohbhtml

Reference

copy 2019 Infosys Limited Bengaluru India All Rights Reserved Infosys believes the information in this document is accurate as of its publication date such information is subject to change without notice Infosys acknowledges the proprietary rights of other companies to the trademarks product names and such other intellectual property rights mentioned in this document Except as expressly permitted neither this documentation nor any part of it may be reproduced stored in a retrieval system or transmitted in any form or by any means electronic mechanical printing photocopying recording or otherwise without the prior permission of Infosys Limited and or any named intellectual property rights holders under this document

For more information contact askusinfosyscom

Infosyscom | NYSE INFY Stay Connected

9 Neural Architecture Search (NAS)

10 Infosys Enterprise Cognitive Platform

bull httpswwworeillycomideaswhat-is-neural-architecture-search

bull httpswwwinfosyscomservicesincubating-emerging-technologiesofferingsPagesenterprise-cognitive-platformaspx

Sudhanshu Hate is inventor and architect of Infosys Enterprise Cognitive Platform (iECP)

a microservices API based Artificial Intelligence platform He has over 21 years of experience

in creating products solutions and working with clients on industry problems His current

areas of interests are Computer Vision Speech and Unstructured Text based AI possibilities

To know more about our work on the H3 trends in AI write to icetsinfosyscom

About the author

Page 10: H3 Trends in AI Algorithms: The Infosys Way...• Sentiment Analysis Figure 1.0: ... from use cases such as language translations, sentence formulation, text summarization, topic extraction

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

Traditional methods of learning in Machine

Learning focuses on taking a huge labeled

dataset and then learning to detect y

(dependent variable say classifying an

image as cat or dog) and given set of x

(independent variables images of cats

and dogs) This process involves selection

of an algorithm such as Convolution

Neural Net and arriving at various hyper

parameters such as number of layers

in the network number of neurons in

each layer learning rate weights bias

dropouts activation function to activate

the neuron such as sigmoid tanh and

Relu The learning happens through

several iterations of forward and backward

passes (propagation) by readjusting (also

called learning) the weights based on

difference in the loss (actual vs computed)

At the minimal loss the weights and

other network parameters are frozen

and are considered final model for future

prediction tasks This is obviously a long

and tedious process and repeating this for

every use case or task is engineering data

and compute intensive

Meta Learning focuses on how to learn to

learn It is one of the fascinating discipline

of artificial intelligence Human beings

have varying styles of learning Some

Humans can learn from their own existing

experiences or experiences they have

heard seen or observed Transfer Learning

discipline of AI is based on similar traits of

human learning where new models can

learn and benefit from existing trained

model

For example if a Computer Vision based

detection model with no Transfer

Learning that already detects various

types of vehicles such as cars trucks and

bicycles needs to be trained to detect an

airplane then you may have to retrain the

full model with images of all the previous

objects

Like the variety in human learning

techniques Meta Learning also uses

various learning methods based on

patterns of problems such as those based

on boundary space amount of data by

optimizing size of neural network or using

recurrent network approach Each of these

are briefly discussed inline

Few Shots Meta-Learning

This learning technique focuses on

learning from a few instances of data

Typically Neural Nets need millions of

data points to learn however Few Shots

Meta- Learning uses only a few instances

of data to build models Examples being

Facial recognition systems using Single

Shot Learning this is explained in detail in

Single Shot Learning section

Optimizer Meta-Learning

In this method the emphasis is on

optimizing the neural network and its

hyper- parameters A great example of

optimizer meta-learning are models that

are focused on improving gradient descent

techniques

Metric Meta-Learning

In this learning method the metric space

is narrowed down to improve the focus of

learning Then the learning is carried out

only in this metric space by leveraging

various optimization parameters that are

established for the given metric space

Recurrent Model Meta-Learning

This type of meta-learning model is tailored

to Recurrent Neural Networks(RNNs)

such as Long-Short-Term-Memory(LSTM)

In this architecture the meta-learner

algorithm will train a RNN model to process

a dataset sequentially and then process

new inputs from the task In an image

classification setting this might involve

passing in the set of (image label) pairs

of a dataset sequentially followed by new

examples which must be classified Meta-

Reinforcement Learning is an example of

this approach

Meta Learning

Transfer Learning (TL)

Types of Meta-Learning Models

people learn and memorize with one

instance of visual or auditory scan Some

people need multiple perspectives to

strengthen the neural connections for

permanent memory Some remember by

writing while some remember through

actual experiences Meta Learning tries

to leverage these to build its learning

characteristics

However with Transfer Learning you can

introduce an additional layer on top of the

existing pre-trained layer to start detecting

airplanes

Typically in a no Transfer Learning

scenario model needs to be trained and

during training right weights are arrived

at by doing many iterations (epochs) of

forward and back propagation which takes

significant amount of computation power

and time In addition Vision models need

significant amount of image data such as

in this example images of airplanes to be

trained

With Transfer Learning approach you can

reuse the existing pre-trained weights of an

existing trained model with significantly

less number of images ( 5 to 10 percent

of actual images needed for training

ground up model) for the model to start

detecting As the pre-trained model has

already learnt some basic learning around

identifying edges curves and shapes in the

earlier layers it needs to learn only higher

order features specific to airplanes with

the existing computed weights In brief

Transfer Learning helps eliminate the need

to learn anything from scratch

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

Transfer Learning helps in saving

significant amount of data computational

power and time in training new models as

they leverage pre-trained weights from the

existing trained models and architectures

However it is important to understand

that Transfer Learning approach today

is only matured enough to be applied to

similar use cases that is you cannot use

the above discussed model to train a facial

recognition model

Another key thing during Transfer Learning

is that it is important to understand the

details of the data on which new use cases

are being trained as it can implicitly push

the built-in biases from the underlying data

into newer systems It is recommended

that the datasheets of underlying models

and data be studied thoroughly unless the

usage is for experimentative purpose

Earlier having used the human brain

rationale it is important to note that

human brains have gone through centuries

of experiences and gene evolution and has

the ability to learn faster whereas transfer

learning is just a few decades old and is

becoming ground for new vision and text

use cases

External Document copy 2019 Infosys Limited

Figure 90 Transfer Learning Layers Source John Cherrie Training Deep Learning Models with Transfer Learning

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

Humans have the impressive skill to reason

about new concepts and experiences with

just a single example They have the ability

for one-shot generalization the aptitude

to encounter a new concept understand

its structure and then generate compelling

alternative variations of the same

Facial recognition systems are good

candidates for Single Shot Learning

otherwise needing ten thousands of

individual face images to train one neural

network can be extremely costly time

consuming and infeasible However a

Single Shot LearningSingle Shot Learning based system using

existing pre-trained FaceNet model and

facial encoding based approach on top of

it can be very effective to establish face

similarity by computing distance between

the faces

In this approach 128 bit encoding of each

face image is generated and compared

with other imagersquos encoding to determine

if the person is same or different

Various distance based algorithms such

as Euclidean distance can be used to

determine if they are within specified

threshold The model training approach

involves creating pairs of (Anchor Positive)

and (Anchor Negative) and training the

model in a way where (Anchor Positive)

pair distance difference is smaller and

(Anchor Negative) distance is farther

ldquoAnchorrdquo is the image of a person for whom

the recognition model needs to be trained

ldquoPositiverdquo is another image of the same

person

ldquoNegativerdquo is image of a different person

External Document copy 2019 Infosys Limited

Figure 100 Encoding approach inspired from ML Course from Coursera

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

This is a specialized Machine Learning

discipline where an agent learns to behave

in an environment by getting reward or

punishment for the actions performed The

agent can have an objective to maximize

short term or long-term rewards This

discipline uses deep learning techniques to

Value Based

In value-based RL the goal is to optimize

the value function V(s) Qtable uses any

The agent will use this value function to

select which state to choose at each step

Policy Based

In policy-based RL we want to directly

optimize the policy function π(s) without

using a value function

The policy is what defines the agent

behavior at a given time

Deep Reinforcement Learning (RL)

Three Approaches to Reinforcement Learning

bring in human level performance on the

given task

Deep Reinforcement Learning has found

significant relevance and application in

various game design systems such as

creating video games chess alpha Go

Atari as well as in industrial applications of

mathematical function to arrive at a state

based on action

The value of each state is the total amount

There are two types of policies

1 Deterministic A policy which at a given

state will always return the same action

2 Stochastic A policy that outputs a

distribution probability over actions

Value based and Policy based are more

conventional Reinforcement Learning

approaches They are useful for modeling

relatively simple systems

robots driverless car etc

In reinforcement learning policy p

controls what action we should take Value

function v measures how good it is to be

in a particular state The value function

tells us the maximum expected future

reward the agent will get at each state

of the reward an agent can expect to

accumulate over the future starting at that

state

State

State

Q value

Q value action 2

Q value action 1

Q value action 3

Qtable

Deep Q Neuralnetwork

Q learning

Deep Q learning

Action

ExpectedReward discounted

Given that state

action = policy(state)

Figure 110 Schema inspired by the Q learning notebook by Udacity

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

Model Based

In model-based RL we model the

environment This means we create a

model of the behavior of the environment

then this model is used to arrive at results

that maximises short term or long-term

rewards The model equation can be

any equation that is defined based on

the environments behavior and must be

sufficiently generalized to counter new

situations

When Model based approach uses Deep

Neural Network algorithms to sufficiently

well generalize and learn the complexities

of the environment to produce optimal

results it is called Deep Reinforcement

Learning The challenge with model based

approach is each environment needs a

dedicated trained model

AlphaGo was trained by using data from

several games to beat the human being

in the game of Go The training accuracy

was just 57 and still it was sufficient to

beat the human level performance The

training methods involved reinforcement

learning and deep learning to build a

policy network that tells what moves are

promising and a value network that tells

how good the board position is Searches

for the final move from these networks

is done using Monte Carlo Tree Search

(MCTS) algorithm Using supervised

learning a policy network was created to

imitate the expert moves

Deep Mind released AlphaGo Zero in late

2017 which beat AlphaGo and did not

involve any training from previous games

data to train deep network The deep

network training was done by picking

the training samples from AlphaGo and

AlphaGo Zero playing games against

itself and selecting best moves to train

the network and then applying those

in real games to improve the results

iteratively This is possible because deep

reinforcement learning algorithms can

store long-range tree search results for the

next best move in memory and do very

large computations that are difficult for a

human brain

Designing machine learning solution

involves several steps such as collecting

data understanding cleansing and

normalizing data doing feature

engineering selecting or designing

the algorithm selecting the model

architecture selecting and tuning modelrsquos

hyper-parameters evaluating modelrsquos

performance deploying and monitoring

the machine learning system in an online

system and so on Such machine learning

solution design requires an expert Data

Scientist to complete the pipeline

Auto ML (AML)As the complexity of these and other tasks

can easily get overwhelming the rapid

growth of machine learning applications

has created a demand for off-the-shelf

machine learning methods that can be

used easily and without expert knowledge

The AI research area that encompasses

progressive automation of machine

learning pipeline tasks is called AutoML

(Automatic Machine Learning)

Google CEO Sundar Pichai wrote

ldquoDesigning neural nets is extremely time

intensive and requires an expertise that

limits its use to a smaller community of

scientists and engineers Thatrsquos why wersquove

created an approach called AutoML

showing that itrsquos possible for neural nets

to design neural netsrdquo while Googlersquos

Head of AI Jeff Dean suggested that 100x

computational power could replace the

need for machine learning expertise

AutoML Vision relies on two core

techniques transfer learning and neural

architecture search

Xtrain Ytrain

Xtest budget

Han

d-cr

afte

d po

rtfo

lio Meta Learning

AutoML system

Build Ensemble Ytest Data

ProcessorFeature

Preprocessor

Bayesian Optimization

Classier

ML Pipeline

Figure 120 An example of Auto sklearn pipeline Source Andreacute Biedenkapp We did it Again World Champions in AutoML

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

Here is a look at the few libraries that help

in implementing AutoML

AUTO-SKLEARN

AUTO SKLEARN automates several key

tasks in Machine Learning pipeline such

as addressing column missing values

encoding of categorical values data scaling

and normalization feature pre-processing

and selection of right algorithm with

hyper-parameters The pipeline supports

15 Classification and 14 Feature processing

algorithms Selection of right algorithm can

happen based on ensembling techniques

and applying meta knowledge gathered

from executing similar scenarios (datasets

and algorithms)

Usage

Auto-sklearn is written in python and can

be considered as replacement for scikit-

learn classifiers Here is a sample set of

commands

gtgtgt import autosklearnclassification

Implementing AutoML gtgtgt cls = autosklearnclassification

AutoSklearnClassifier()

gtgtgt clsfit(X_train y_train)

gtgtgt predictions = clspredict(X_test

y_test)

SMAC (Sequential Model-Based

Algorithm Configuration)

SMAC is a tool for automating certain

AutoML steps SMAC is useful for selection

of key features hyper-parameter

optimization and to speed up algorithmic

outputs

BOHB (Bayesian Optimization

Hyperband searches)

BOHB combines Bayesian hyper parameter

optimization with bandit methods for

faster convergence

Google H2O also have their respective

AutoML tools which are not covered here

but can be explored in specific cases

AutoML needs significant memory and

computational power to execute alternate

algorithms and compute results At

present GPU resources are extremely

costly to execute even simple Machine

Learning workloads such as CNN algorithm

to classify objects If multiple such

alternate algorithms should be executed

the computation dollar needed would be

exponential This is impractical infeasible

and inefficient for the current state of Data

Science industry Adoption of AutoML will

depend on two things one the maturity

of AutoML pipeline and second but more

important how quickly GPU clusters

become cheap The second being most

critical Selling Cloud GPU capacity could

be one of the motivation of several cloud

based infrastructure-running companies

to promote AutoML in the industry Also

AutoML will not replace the Data scientistrsquos

work but can provide augmentation

and speed to certain tasks such as data

standardization model tuning and

trying multiple algorithms It is only the

beginning for AutoML but this technique

has high relevance and usefulness for

solving ultra-complex problems

External Document copy 2019 Infosys Limited

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

Neural Architecture Search (NAS) is a

component of AutoML and addresses

the important step of designing Neural

Network Architecture

Designing fresh Neural Net architecture

involves an expert establishing and

organizing Neural Network layers filters

or channels filter sizes selecting other

optimum Hyper parameters and so on

through several rounds of computational

Neural Architecture Search (NAS)

iterations Since AlexNet deep neural

network architecture won the ImageNet

(image classification based on ImageNet

dataset) competition in 2012 several

architecture styles such as VGG ResNet

Inception Xception InceptionResNet

MobileNet and NASNet have significantly

evolved However selection of the right

architecture for the right problem is also

a skill due to the presence of various

influencers such as applicability to the

problem accuracy number of parameters

memory and computational footprint and

size of the architecture that govern the

overall functioning efficiency

Neural Architecture Search tries to address

this problem space by automatically

selecting right Neural Network architecture

to solve a given problem

AutoML

HyperparameterOptimization

NAS

External Document copy 2019 Infosys Limited

Figure 130 Source Liam Li Ameet Talwalkar What is neural architecture search

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

Search space The search space provides

boundary within which the specific

architecture needs to be searched

Computer Vision (captioning the scene

or product identification) based use

cases would need a different neural

network architecture style as against

Speech (speech transcription or speaker

classification) or unstructured Text (Topic

extraction intent mining) based use cases

Search space tries to provide available

catalogs of best in class architectures based

on other domain data and performance

Key Components of NAS

These are also usually hand crafted by

expert data scientists

Optimization method This is responsible

for providing mechanism to search the

best architecture It could be searched

and applied randomly or using certain

statistical or Machine Learning evaluation

approach such as Bayesian method or

reinforcement learning methods

Evaluation method This has the role

of evaluating the quality of architecture

considered by optimization method It

could be done using full training approach

or doing partial training and then applying

certain specialized methods such as partial

training or early stopping weights sharing

network morphism etc

For selective problem spaces as

compared to manual methods NAS have

outperformed and is showing definite

promise for future However it is still

evolving and not ready for production

usages as several architectures need to be

established and evaluated depending on

the problem space

Search Space

DAG Representation

Cell Block

Meta-Architecture

NAS Specic

Reinforcement Learning

Evolutionary Search

Gradient-Based Optimization

BayesianOptimization

Optimization Method

Components of NAS

Full Training

Partial Training

Weight-Sharing

Network Morphism

Hypernetworks

Evaluation Method

Figure 140 Components of NAS Source Liam Li Ameet Talwalkar What is neural architecture search

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

In this paper we looked at some key H3 AI

areas by no means this is an exhaustive

list Amongst all discussed Transfer

Learning Capsule Networks Explainable

AI Generative AI are making interesting

Addressing H3 AI Trends at Infosys

things possible and looks highly promising

We are keenly experimenting with these

building early use cases and integrating

into our product stack Infosys Enterprise

Cognitive platform (iECP) to solve

interesting client problems Here is a look

at how we are employing these H3 trends

in the work we do

Trend Use cases

1

2

3

4

5

6

7

8

9

10

Explainable AI (XAI)

Generative AI Neural Style Transfer (NST)

Fine Grained Classication

Capsule Networks

Meta Learning

Transfer Learning

Single Shot Learning

Deep Reinforcement Learning (RL)

Auto ML

Neural Architecture Search (NAS)

Applicable across where results need to be traced eg Tumor Detection Mortgage Rejection Candidate Selection etc

Art Generation Sketch Generation Image or Video Resolution Improvements Data GenerationAugmentation Music Generation

Vehicle Classication Type of Tumor Detection

Image Re-constructionImage ComparisonMatching

Intelligent Agents Continuous Learning scenarios for document review and corrections

Identifying person not wearing helmet Logobrand detection in the image Speech Model training for various accents vocabularies

Face Recognition Face Verication

Intelligent Agents Robots Driverless cars Trac Light Monitoring Continuous Learning scenarios for document review and corrections

Invoice Attribute Extraction Document Classication Document Clustering

CNN or RNN based use cases such as Image Classication Object Identication Image Segmentation Speaker Classication etc

External Document copy 2019 Infosys Limited

Table 20 AI Use cases Infosys Research

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

1 Explainable AI (XAI)

2 Fine Grained Classification

5 Transfer Learning

6 Single Shot Learning

3 Capsule Networks

4 Meta Learning

7 Deep Reinforcement Learning (RL)

8 Auto ML

bull httpschristophmgithubiointerpretable-ml-book

bull httpssimmachinescomexplainable-ai

bull httpswwwcmuedunewsstoriesarchives2018octoberexplainable-aihtml

bull httpsmediumcomQuantumBlackmaking-ai-human-again-the-importance-of-explainable-ai-xai-95d347ccbb1c

bull httpstowardsdatasciencecomexplainable-artificial-intelligence-part-2-model-interpretation-strategies-75d4afa6b739

bull httpsvisioncornelleduse3wp-contentuploads201502BMVC14pdf

bull httpswwwfastai20180723auto-ml-3

bull httpsarxivorgpdf160305106pdf

bull httpsarxivorgpdf171009829pdf

bull httpskerasioexamplescifar10_cnn_capsule

bull httpswwwyoutubecomwatchv=pPN8d0E3900

bull httpswwwyoutubecomwatchv=rTawFwUvnLE

bull httpsmediumfreecodecamporgunderstanding-capsule-networks-ais-alluring-new-architecture-bdb228173ddc

bull httpsmediumcomjrodthoughtswhats-new-in-deep-learning-research-openai-s-reptile-makes-it-easier-to-learn-how-to-learn-e0f6651a39f0

bull httpproceedingsmlrpressv48santoro16pdf

bull httpstowardsdatasciencecomwhats-new-in-deep-learning-research-understanding-meta-learning-91fef1295660

bull httpsdeepmindcomblogarticledeep-reinforcement-learning

bull httpsmediumfreecodecamporgan-introduction-to-reinforcement-learning-4339519de419

bull httpsmediumcomjonathan_huialphago-zero-a-game-changer-14ef6e45eba5

bull httpsarxivorgpdf181112560pdf

bull httpswwwml4aadorgautomated-algorithm-designalgorithm-configurationsmac

bull httpswwwfastai20180723auto-ml-3

bull httpswwwfastai20180716auto-ml2auto-ml

bull httpscompetitionscodalaborgcompetitions17767

bull httpswwwautomlorgautomlauto-sklearn

bull httpswwwml4aadorgautomated-algorithm-designalgorithm-configurationsmac

bull httpsautomlgithubioHpBandSterbuildhtmloptimizersbohbhtml

Reference

copy 2019 Infosys Limited Bengaluru India All Rights Reserved Infosys believes the information in this document is accurate as of its publication date such information is subject to change without notice Infosys acknowledges the proprietary rights of other companies to the trademarks product names and such other intellectual property rights mentioned in this document Except as expressly permitted neither this documentation nor any part of it may be reproduced stored in a retrieval system or transmitted in any form or by any means electronic mechanical printing photocopying recording or otherwise without the prior permission of Infosys Limited and or any named intellectual property rights holders under this document

For more information contact askusinfosyscom

Infosyscom | NYSE INFY Stay Connected

9 Neural Architecture Search (NAS)

10 Infosys Enterprise Cognitive Platform

bull httpswwworeillycomideaswhat-is-neural-architecture-search

bull httpswwwinfosyscomservicesincubating-emerging-technologiesofferingsPagesenterprise-cognitive-platformaspx

Sudhanshu Hate is inventor and architect of Infosys Enterprise Cognitive Platform (iECP)

a microservices API based Artificial Intelligence platform He has over 21 years of experience

in creating products solutions and working with clients on industry problems His current

areas of interests are Computer Vision Speech and Unstructured Text based AI possibilities

To know more about our work on the H3 trends in AI write to icetsinfosyscom

About the author

Page 11: H3 Trends in AI Algorithms: The Infosys Way...• Sentiment Analysis Figure 1.0: ... from use cases such as language translations, sentence formulation, text summarization, topic extraction

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

Transfer Learning helps in saving

significant amount of data computational

power and time in training new models as

they leverage pre-trained weights from the

existing trained models and architectures

However it is important to understand

that Transfer Learning approach today

is only matured enough to be applied to

similar use cases that is you cannot use

the above discussed model to train a facial

recognition model

Another key thing during Transfer Learning

is that it is important to understand the

details of the data on which new use cases

are being trained as it can implicitly push

the built-in biases from the underlying data

into newer systems It is recommended

that the datasheets of underlying models

and data be studied thoroughly unless the

usage is for experimentative purpose

Earlier having used the human brain

rationale it is important to note that

human brains have gone through centuries

of experiences and gene evolution and has

the ability to learn faster whereas transfer

learning is just a few decades old and is

becoming ground for new vision and text

use cases

External Document copy 2019 Infosys Limited

Figure 90 Transfer Learning Layers Source John Cherrie Training Deep Learning Models with Transfer Learning

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

Humans have the impressive skill to reason

about new concepts and experiences with

just a single example They have the ability

for one-shot generalization the aptitude

to encounter a new concept understand

its structure and then generate compelling

alternative variations of the same

Facial recognition systems are good

candidates for Single Shot Learning

otherwise needing ten thousands of

individual face images to train one neural

network can be extremely costly time

consuming and infeasible However a

Single Shot LearningSingle Shot Learning based system using

existing pre-trained FaceNet model and

facial encoding based approach on top of

it can be very effective to establish face

similarity by computing distance between

the faces

In this approach 128 bit encoding of each

face image is generated and compared

with other imagersquos encoding to determine

if the person is same or different

Various distance based algorithms such

as Euclidean distance can be used to

determine if they are within specified

threshold The model training approach

involves creating pairs of (Anchor Positive)

and (Anchor Negative) and training the

model in a way where (Anchor Positive)

pair distance difference is smaller and

(Anchor Negative) distance is farther

ldquoAnchorrdquo is the image of a person for whom

the recognition model needs to be trained

ldquoPositiverdquo is another image of the same

person

ldquoNegativerdquo is image of a different person

External Document copy 2019 Infosys Limited

Figure 100 Encoding approach inspired from ML Course from Coursera

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

This is a specialized Machine Learning

discipline where an agent learns to behave

in an environment by getting reward or

punishment for the actions performed The

agent can have an objective to maximize

short term or long-term rewards This

discipline uses deep learning techniques to

Value Based

In value-based RL the goal is to optimize

the value function V(s) Qtable uses any

The agent will use this value function to

select which state to choose at each step

Policy Based

In policy-based RL we want to directly

optimize the policy function π(s) without

using a value function

The policy is what defines the agent

behavior at a given time

Deep Reinforcement Learning (RL)

Three Approaches to Reinforcement Learning

bring in human level performance on the

given task

Deep Reinforcement Learning has found

significant relevance and application in

various game design systems such as

creating video games chess alpha Go

Atari as well as in industrial applications of

mathematical function to arrive at a state

based on action

The value of each state is the total amount

There are two types of policies

1 Deterministic A policy which at a given

state will always return the same action

2 Stochastic A policy that outputs a

distribution probability over actions

Value based and Policy based are more

conventional Reinforcement Learning

approaches They are useful for modeling

relatively simple systems

robots driverless car etc

In reinforcement learning policy p

controls what action we should take Value

function v measures how good it is to be

in a particular state The value function

tells us the maximum expected future

reward the agent will get at each state

of the reward an agent can expect to

accumulate over the future starting at that

state

State

State

Q value

Q value action 2

Q value action 1

Q value action 3

Qtable

Deep Q Neuralnetwork

Q learning

Deep Q learning

Action

ExpectedReward discounted

Given that state

action = policy(state)

Figure 110 Schema inspired by the Q learning notebook by Udacity

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

Model Based

In model-based RL we model the

environment This means we create a

model of the behavior of the environment

then this model is used to arrive at results

that maximises short term or long-term

rewards The model equation can be

any equation that is defined based on

the environments behavior and must be

sufficiently generalized to counter new

situations

When Model based approach uses Deep

Neural Network algorithms to sufficiently

well generalize and learn the complexities

of the environment to produce optimal

results it is called Deep Reinforcement

Learning The challenge with model based

approach is each environment needs a

dedicated trained model

AlphaGo was trained by using data from

several games to beat the human being

in the game of Go The training accuracy

was just 57 and still it was sufficient to

beat the human level performance The

training methods involved reinforcement

learning and deep learning to build a

policy network that tells what moves are

promising and a value network that tells

how good the board position is Searches

for the final move from these networks

is done using Monte Carlo Tree Search

(MCTS) algorithm Using supervised

learning a policy network was created to

imitate the expert moves

Deep Mind released AlphaGo Zero in late

2017 which beat AlphaGo and did not

involve any training from previous games

data to train deep network The deep

network training was done by picking

the training samples from AlphaGo and

AlphaGo Zero playing games against

itself and selecting best moves to train

the network and then applying those

in real games to improve the results

iteratively This is possible because deep

reinforcement learning algorithms can

store long-range tree search results for the

next best move in memory and do very

large computations that are difficult for a

human brain

Designing machine learning solution

involves several steps such as collecting

data understanding cleansing and

normalizing data doing feature

engineering selecting or designing

the algorithm selecting the model

architecture selecting and tuning modelrsquos

hyper-parameters evaluating modelrsquos

performance deploying and monitoring

the machine learning system in an online

system and so on Such machine learning

solution design requires an expert Data

Scientist to complete the pipeline

Auto ML (AML)As the complexity of these and other tasks

can easily get overwhelming the rapid

growth of machine learning applications

has created a demand for off-the-shelf

machine learning methods that can be

used easily and without expert knowledge

The AI research area that encompasses

progressive automation of machine

learning pipeline tasks is called AutoML

(Automatic Machine Learning)

Google CEO Sundar Pichai wrote

ldquoDesigning neural nets is extremely time

intensive and requires an expertise that

limits its use to a smaller community of

scientists and engineers Thatrsquos why wersquove

created an approach called AutoML

showing that itrsquos possible for neural nets

to design neural netsrdquo while Googlersquos

Head of AI Jeff Dean suggested that 100x

computational power could replace the

need for machine learning expertise

AutoML Vision relies on two core

techniques transfer learning and neural

architecture search

Xtrain Ytrain

Xtest budget

Han

d-cr

afte

d po

rtfo

lio Meta Learning

AutoML system

Build Ensemble Ytest Data

ProcessorFeature

Preprocessor

Bayesian Optimization

Classier

ML Pipeline

Figure 120 An example of Auto sklearn pipeline Source Andreacute Biedenkapp We did it Again World Champions in AutoML

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

Here is a look at the few libraries that help

in implementing AutoML

AUTO-SKLEARN

AUTO SKLEARN automates several key

tasks in Machine Learning pipeline such

as addressing column missing values

encoding of categorical values data scaling

and normalization feature pre-processing

and selection of right algorithm with

hyper-parameters The pipeline supports

15 Classification and 14 Feature processing

algorithms Selection of right algorithm can

happen based on ensembling techniques

and applying meta knowledge gathered

from executing similar scenarios (datasets

and algorithms)

Usage

Auto-sklearn is written in python and can

be considered as replacement for scikit-

learn classifiers Here is a sample set of

commands

gtgtgt import autosklearnclassification

Implementing AutoML gtgtgt cls = autosklearnclassification

AutoSklearnClassifier()

gtgtgt clsfit(X_train y_train)

gtgtgt predictions = clspredict(X_test

y_test)

SMAC (Sequential Model-Based

Algorithm Configuration)

SMAC is a tool for automating certain

AutoML steps SMAC is useful for selection

of key features hyper-parameter

optimization and to speed up algorithmic

outputs

BOHB (Bayesian Optimization

Hyperband searches)

BOHB combines Bayesian hyper parameter

optimization with bandit methods for

faster convergence

Google H2O also have their respective

AutoML tools which are not covered here

but can be explored in specific cases

AutoML needs significant memory and

computational power to execute alternate

algorithms and compute results At

present GPU resources are extremely

costly to execute even simple Machine

Learning workloads such as CNN algorithm

to classify objects If multiple such

alternate algorithms should be executed

the computation dollar needed would be

exponential This is impractical infeasible

and inefficient for the current state of Data

Science industry Adoption of AutoML will

depend on two things one the maturity

of AutoML pipeline and second but more

important how quickly GPU clusters

become cheap The second being most

critical Selling Cloud GPU capacity could

be one of the motivation of several cloud

based infrastructure-running companies

to promote AutoML in the industry Also

AutoML will not replace the Data scientistrsquos

work but can provide augmentation

and speed to certain tasks such as data

standardization model tuning and

trying multiple algorithms It is only the

beginning for AutoML but this technique

has high relevance and usefulness for

solving ultra-complex problems

External Document copy 2019 Infosys Limited

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

Neural Architecture Search (NAS) is a

component of AutoML and addresses

the important step of designing Neural

Network Architecture

Designing fresh Neural Net architecture

involves an expert establishing and

organizing Neural Network layers filters

or channels filter sizes selecting other

optimum Hyper parameters and so on

through several rounds of computational

Neural Architecture Search (NAS)

iterations Since AlexNet deep neural

network architecture won the ImageNet

(image classification based on ImageNet

dataset) competition in 2012 several

architecture styles such as VGG ResNet

Inception Xception InceptionResNet

MobileNet and NASNet have significantly

evolved However selection of the right

architecture for the right problem is also

a skill due to the presence of various

influencers such as applicability to the

problem accuracy number of parameters

memory and computational footprint and

size of the architecture that govern the

overall functioning efficiency

Neural Architecture Search tries to address

this problem space by automatically

selecting right Neural Network architecture

to solve a given problem

AutoML

HyperparameterOptimization

NAS

External Document copy 2019 Infosys Limited

Figure 130 Source Liam Li Ameet Talwalkar What is neural architecture search

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

Search space The search space provides

boundary within which the specific

architecture needs to be searched

Computer Vision (captioning the scene

or product identification) based use

cases would need a different neural

network architecture style as against

Speech (speech transcription or speaker

classification) or unstructured Text (Topic

extraction intent mining) based use cases

Search space tries to provide available

catalogs of best in class architectures based

on other domain data and performance

Key Components of NAS

These are also usually hand crafted by

expert data scientists

Optimization method This is responsible

for providing mechanism to search the

best architecture It could be searched

and applied randomly or using certain

statistical or Machine Learning evaluation

approach such as Bayesian method or

reinforcement learning methods

Evaluation method This has the role

of evaluating the quality of architecture

considered by optimization method It

could be done using full training approach

or doing partial training and then applying

certain specialized methods such as partial

training or early stopping weights sharing

network morphism etc

For selective problem spaces as

compared to manual methods NAS have

outperformed and is showing definite

promise for future However it is still

evolving and not ready for production

usages as several architectures need to be

established and evaluated depending on

the problem space

Search Space

DAG Representation

Cell Block

Meta-Architecture

NAS Specic

Reinforcement Learning

Evolutionary Search

Gradient-Based Optimization

BayesianOptimization

Optimization Method

Components of NAS

Full Training

Partial Training

Weight-Sharing

Network Morphism

Hypernetworks

Evaluation Method

Figure 140 Components of NAS Source Liam Li Ameet Talwalkar What is neural architecture search

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

In this paper we looked at some key H3 AI

areas by no means this is an exhaustive

list Amongst all discussed Transfer

Learning Capsule Networks Explainable

AI Generative AI are making interesting

Addressing H3 AI Trends at Infosys

things possible and looks highly promising

We are keenly experimenting with these

building early use cases and integrating

into our product stack Infosys Enterprise

Cognitive platform (iECP) to solve

interesting client problems Here is a look

at how we are employing these H3 trends

in the work we do

Trend Use cases

1

2

3

4

5

6

7

8

9

10

Explainable AI (XAI)

Generative AI Neural Style Transfer (NST)

Fine Grained Classication

Capsule Networks

Meta Learning

Transfer Learning

Single Shot Learning

Deep Reinforcement Learning (RL)

Auto ML

Neural Architecture Search (NAS)

Applicable across where results need to be traced eg Tumor Detection Mortgage Rejection Candidate Selection etc

Art Generation Sketch Generation Image or Video Resolution Improvements Data GenerationAugmentation Music Generation

Vehicle Classication Type of Tumor Detection

Image Re-constructionImage ComparisonMatching

Intelligent Agents Continuous Learning scenarios for document review and corrections

Identifying person not wearing helmet Logobrand detection in the image Speech Model training for various accents vocabularies

Face Recognition Face Verication

Intelligent Agents Robots Driverless cars Trac Light Monitoring Continuous Learning scenarios for document review and corrections

Invoice Attribute Extraction Document Classication Document Clustering

CNN or RNN based use cases such as Image Classication Object Identication Image Segmentation Speaker Classication etc

External Document copy 2019 Infosys Limited

Table 20 AI Use cases Infosys Research

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

1 Explainable AI (XAI)

2 Fine Grained Classification

5 Transfer Learning

6 Single Shot Learning

3 Capsule Networks

4 Meta Learning

7 Deep Reinforcement Learning (RL)

8 Auto ML

bull httpschristophmgithubiointerpretable-ml-book

bull httpssimmachinescomexplainable-ai

bull httpswwwcmuedunewsstoriesarchives2018octoberexplainable-aihtml

bull httpsmediumcomQuantumBlackmaking-ai-human-again-the-importance-of-explainable-ai-xai-95d347ccbb1c

bull httpstowardsdatasciencecomexplainable-artificial-intelligence-part-2-model-interpretation-strategies-75d4afa6b739

bull httpsvisioncornelleduse3wp-contentuploads201502BMVC14pdf

bull httpswwwfastai20180723auto-ml-3

bull httpsarxivorgpdf160305106pdf

bull httpsarxivorgpdf171009829pdf

bull httpskerasioexamplescifar10_cnn_capsule

bull httpswwwyoutubecomwatchv=pPN8d0E3900

bull httpswwwyoutubecomwatchv=rTawFwUvnLE

bull httpsmediumfreecodecamporgunderstanding-capsule-networks-ais-alluring-new-architecture-bdb228173ddc

bull httpsmediumcomjrodthoughtswhats-new-in-deep-learning-research-openai-s-reptile-makes-it-easier-to-learn-how-to-learn-e0f6651a39f0

bull httpproceedingsmlrpressv48santoro16pdf

bull httpstowardsdatasciencecomwhats-new-in-deep-learning-research-understanding-meta-learning-91fef1295660

bull httpsdeepmindcomblogarticledeep-reinforcement-learning

bull httpsmediumfreecodecamporgan-introduction-to-reinforcement-learning-4339519de419

bull httpsmediumcomjonathan_huialphago-zero-a-game-changer-14ef6e45eba5

bull httpsarxivorgpdf181112560pdf

bull httpswwwml4aadorgautomated-algorithm-designalgorithm-configurationsmac

bull httpswwwfastai20180723auto-ml-3

bull httpswwwfastai20180716auto-ml2auto-ml

bull httpscompetitionscodalaborgcompetitions17767

bull httpswwwautomlorgautomlauto-sklearn

bull httpswwwml4aadorgautomated-algorithm-designalgorithm-configurationsmac

bull httpsautomlgithubioHpBandSterbuildhtmloptimizersbohbhtml

Reference

copy 2019 Infosys Limited Bengaluru India All Rights Reserved Infosys believes the information in this document is accurate as of its publication date such information is subject to change without notice Infosys acknowledges the proprietary rights of other companies to the trademarks product names and such other intellectual property rights mentioned in this document Except as expressly permitted neither this documentation nor any part of it may be reproduced stored in a retrieval system or transmitted in any form or by any means electronic mechanical printing photocopying recording or otherwise without the prior permission of Infosys Limited and or any named intellectual property rights holders under this document

For more information contact askusinfosyscom

Infosyscom | NYSE INFY Stay Connected

9 Neural Architecture Search (NAS)

10 Infosys Enterprise Cognitive Platform

bull httpswwworeillycomideaswhat-is-neural-architecture-search

bull httpswwwinfosyscomservicesincubating-emerging-technologiesofferingsPagesenterprise-cognitive-platformaspx

Sudhanshu Hate is inventor and architect of Infosys Enterprise Cognitive Platform (iECP)

a microservices API based Artificial Intelligence platform He has over 21 years of experience

in creating products solutions and working with clients on industry problems His current

areas of interests are Computer Vision Speech and Unstructured Text based AI possibilities

To know more about our work on the H3 trends in AI write to icetsinfosyscom

About the author

Page 12: H3 Trends in AI Algorithms: The Infosys Way...• Sentiment Analysis Figure 1.0: ... from use cases such as language translations, sentence formulation, text summarization, topic extraction

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

Humans have the impressive skill to reason

about new concepts and experiences with

just a single example They have the ability

for one-shot generalization the aptitude

to encounter a new concept understand

its structure and then generate compelling

alternative variations of the same

Facial recognition systems are good

candidates for Single Shot Learning

otherwise needing ten thousands of

individual face images to train one neural

network can be extremely costly time

consuming and infeasible However a

Single Shot LearningSingle Shot Learning based system using

existing pre-trained FaceNet model and

facial encoding based approach on top of

it can be very effective to establish face

similarity by computing distance between

the faces

In this approach 128 bit encoding of each

face image is generated and compared

with other imagersquos encoding to determine

if the person is same or different

Various distance based algorithms such

as Euclidean distance can be used to

determine if they are within specified

threshold The model training approach

involves creating pairs of (Anchor Positive)

and (Anchor Negative) and training the

model in a way where (Anchor Positive)

pair distance difference is smaller and

(Anchor Negative) distance is farther

ldquoAnchorrdquo is the image of a person for whom

the recognition model needs to be trained

ldquoPositiverdquo is another image of the same

person

ldquoNegativerdquo is image of a different person

External Document copy 2019 Infosys Limited

Figure 100 Encoding approach inspired from ML Course from Coursera

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

This is a specialized Machine Learning

discipline where an agent learns to behave

in an environment by getting reward or

punishment for the actions performed The

agent can have an objective to maximize

short term or long-term rewards This

discipline uses deep learning techniques to

Value Based

In value-based RL the goal is to optimize

the value function V(s) Qtable uses any

The agent will use this value function to

select which state to choose at each step

Policy Based

In policy-based RL we want to directly

optimize the policy function π(s) without

using a value function

The policy is what defines the agent

behavior at a given time

Deep Reinforcement Learning (RL)

Three Approaches to Reinforcement Learning

bring in human level performance on the

given task

Deep Reinforcement Learning has found

significant relevance and application in

various game design systems such as

creating video games chess alpha Go

Atari as well as in industrial applications of

mathematical function to arrive at a state

based on action

The value of each state is the total amount

There are two types of policies

1 Deterministic A policy which at a given

state will always return the same action

2 Stochastic A policy that outputs a

distribution probability over actions

Value based and Policy based are more

conventional Reinforcement Learning

approaches They are useful for modeling

relatively simple systems

robots driverless car etc

In reinforcement learning policy p

controls what action we should take Value

function v measures how good it is to be

in a particular state The value function

tells us the maximum expected future

reward the agent will get at each state

of the reward an agent can expect to

accumulate over the future starting at that

state

State

State

Q value

Q value action 2

Q value action 1

Q value action 3

Qtable

Deep Q Neuralnetwork

Q learning

Deep Q learning

Action

ExpectedReward discounted

Given that state

action = policy(state)

Figure 110 Schema inspired by the Q learning notebook by Udacity

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

Model Based

In model-based RL we model the

environment This means we create a

model of the behavior of the environment

then this model is used to arrive at results

that maximises short term or long-term

rewards The model equation can be

any equation that is defined based on

the environments behavior and must be

sufficiently generalized to counter new

situations

When Model based approach uses Deep

Neural Network algorithms to sufficiently

well generalize and learn the complexities

of the environment to produce optimal

results it is called Deep Reinforcement

Learning The challenge with model based

approach is each environment needs a

dedicated trained model

AlphaGo was trained by using data from

several games to beat the human being

in the game of Go The training accuracy

was just 57 and still it was sufficient to

beat the human level performance The

training methods involved reinforcement

learning and deep learning to build a

policy network that tells what moves are

promising and a value network that tells

how good the board position is Searches

for the final move from these networks

is done using Monte Carlo Tree Search

(MCTS) algorithm Using supervised

learning a policy network was created to

imitate the expert moves

Deep Mind released AlphaGo Zero in late

2017 which beat AlphaGo and did not

involve any training from previous games

data to train deep network The deep

network training was done by picking

the training samples from AlphaGo and

AlphaGo Zero playing games against

itself and selecting best moves to train

the network and then applying those

in real games to improve the results

iteratively This is possible because deep

reinforcement learning algorithms can

store long-range tree search results for the

next best move in memory and do very

large computations that are difficult for a

human brain

Designing machine learning solution

involves several steps such as collecting

data understanding cleansing and

normalizing data doing feature

engineering selecting or designing

the algorithm selecting the model

architecture selecting and tuning modelrsquos

hyper-parameters evaluating modelrsquos

performance deploying and monitoring

the machine learning system in an online

system and so on Such machine learning

solution design requires an expert Data

Scientist to complete the pipeline

Auto ML (AML)As the complexity of these and other tasks

can easily get overwhelming the rapid

growth of machine learning applications

has created a demand for off-the-shelf

machine learning methods that can be

used easily and without expert knowledge

The AI research area that encompasses

progressive automation of machine

learning pipeline tasks is called AutoML

(Automatic Machine Learning)

Google CEO Sundar Pichai wrote

ldquoDesigning neural nets is extremely time

intensive and requires an expertise that

limits its use to a smaller community of

scientists and engineers Thatrsquos why wersquove

created an approach called AutoML

showing that itrsquos possible for neural nets

to design neural netsrdquo while Googlersquos

Head of AI Jeff Dean suggested that 100x

computational power could replace the

need for machine learning expertise

AutoML Vision relies on two core

techniques transfer learning and neural

architecture search

Xtrain Ytrain

Xtest budget

Han

d-cr

afte

d po

rtfo

lio Meta Learning

AutoML system

Build Ensemble Ytest Data

ProcessorFeature

Preprocessor

Bayesian Optimization

Classier

ML Pipeline

Figure 120 An example of Auto sklearn pipeline Source Andreacute Biedenkapp We did it Again World Champions in AutoML

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

Here is a look at the few libraries that help

in implementing AutoML

AUTO-SKLEARN

AUTO SKLEARN automates several key

tasks in Machine Learning pipeline such

as addressing column missing values

encoding of categorical values data scaling

and normalization feature pre-processing

and selection of right algorithm with

hyper-parameters The pipeline supports

15 Classification and 14 Feature processing

algorithms Selection of right algorithm can

happen based on ensembling techniques

and applying meta knowledge gathered

from executing similar scenarios (datasets

and algorithms)

Usage

Auto-sklearn is written in python and can

be considered as replacement for scikit-

learn classifiers Here is a sample set of

commands

gtgtgt import autosklearnclassification

Implementing AutoML gtgtgt cls = autosklearnclassification

AutoSklearnClassifier()

gtgtgt clsfit(X_train y_train)

gtgtgt predictions = clspredict(X_test

y_test)

SMAC (Sequential Model-Based

Algorithm Configuration)

SMAC is a tool for automating certain

AutoML steps SMAC is useful for selection

of key features hyper-parameter

optimization and to speed up algorithmic

outputs

BOHB (Bayesian Optimization

Hyperband searches)

BOHB combines Bayesian hyper parameter

optimization with bandit methods for

faster convergence

Google H2O also have their respective

AutoML tools which are not covered here

but can be explored in specific cases

AutoML needs significant memory and

computational power to execute alternate

algorithms and compute results At

present GPU resources are extremely

costly to execute even simple Machine

Learning workloads such as CNN algorithm

to classify objects If multiple such

alternate algorithms should be executed

the computation dollar needed would be

exponential This is impractical infeasible

and inefficient for the current state of Data

Science industry Adoption of AutoML will

depend on two things one the maturity

of AutoML pipeline and second but more

important how quickly GPU clusters

become cheap The second being most

critical Selling Cloud GPU capacity could

be one of the motivation of several cloud

based infrastructure-running companies

to promote AutoML in the industry Also

AutoML will not replace the Data scientistrsquos

work but can provide augmentation

and speed to certain tasks such as data

standardization model tuning and

trying multiple algorithms It is only the

beginning for AutoML but this technique

has high relevance and usefulness for

solving ultra-complex problems

External Document copy 2019 Infosys Limited

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

Neural Architecture Search (NAS) is a

component of AutoML and addresses

the important step of designing Neural

Network Architecture

Designing fresh Neural Net architecture

involves an expert establishing and

organizing Neural Network layers filters

or channels filter sizes selecting other

optimum Hyper parameters and so on

through several rounds of computational

Neural Architecture Search (NAS)

iterations Since AlexNet deep neural

network architecture won the ImageNet

(image classification based on ImageNet

dataset) competition in 2012 several

architecture styles such as VGG ResNet

Inception Xception InceptionResNet

MobileNet and NASNet have significantly

evolved However selection of the right

architecture for the right problem is also

a skill due to the presence of various

influencers such as applicability to the

problem accuracy number of parameters

memory and computational footprint and

size of the architecture that govern the

overall functioning efficiency

Neural Architecture Search tries to address

this problem space by automatically

selecting right Neural Network architecture

to solve a given problem

AutoML

HyperparameterOptimization

NAS

External Document copy 2019 Infosys Limited

Figure 130 Source Liam Li Ameet Talwalkar What is neural architecture search

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

Search space The search space provides

boundary within which the specific

architecture needs to be searched

Computer Vision (captioning the scene

or product identification) based use

cases would need a different neural

network architecture style as against

Speech (speech transcription or speaker

classification) or unstructured Text (Topic

extraction intent mining) based use cases

Search space tries to provide available

catalogs of best in class architectures based

on other domain data and performance

Key Components of NAS

These are also usually hand crafted by

expert data scientists

Optimization method This is responsible

for providing mechanism to search the

best architecture It could be searched

and applied randomly or using certain

statistical or Machine Learning evaluation

approach such as Bayesian method or

reinforcement learning methods

Evaluation method This has the role

of evaluating the quality of architecture

considered by optimization method It

could be done using full training approach

or doing partial training and then applying

certain specialized methods such as partial

training or early stopping weights sharing

network morphism etc

For selective problem spaces as

compared to manual methods NAS have

outperformed and is showing definite

promise for future However it is still

evolving and not ready for production

usages as several architectures need to be

established and evaluated depending on

the problem space

Search Space

DAG Representation

Cell Block

Meta-Architecture

NAS Specic

Reinforcement Learning

Evolutionary Search

Gradient-Based Optimization

BayesianOptimization

Optimization Method

Components of NAS

Full Training

Partial Training

Weight-Sharing

Network Morphism

Hypernetworks

Evaluation Method

Figure 140 Components of NAS Source Liam Li Ameet Talwalkar What is neural architecture search

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

In this paper we looked at some key H3 AI

areas by no means this is an exhaustive

list Amongst all discussed Transfer

Learning Capsule Networks Explainable

AI Generative AI are making interesting

Addressing H3 AI Trends at Infosys

things possible and looks highly promising

We are keenly experimenting with these

building early use cases and integrating

into our product stack Infosys Enterprise

Cognitive platform (iECP) to solve

interesting client problems Here is a look

at how we are employing these H3 trends

in the work we do

Trend Use cases

1

2

3

4

5

6

7

8

9

10

Explainable AI (XAI)

Generative AI Neural Style Transfer (NST)

Fine Grained Classication

Capsule Networks

Meta Learning

Transfer Learning

Single Shot Learning

Deep Reinforcement Learning (RL)

Auto ML

Neural Architecture Search (NAS)

Applicable across where results need to be traced eg Tumor Detection Mortgage Rejection Candidate Selection etc

Art Generation Sketch Generation Image or Video Resolution Improvements Data GenerationAugmentation Music Generation

Vehicle Classication Type of Tumor Detection

Image Re-constructionImage ComparisonMatching

Intelligent Agents Continuous Learning scenarios for document review and corrections

Identifying person not wearing helmet Logobrand detection in the image Speech Model training for various accents vocabularies

Face Recognition Face Verication

Intelligent Agents Robots Driverless cars Trac Light Monitoring Continuous Learning scenarios for document review and corrections

Invoice Attribute Extraction Document Classication Document Clustering

CNN or RNN based use cases such as Image Classication Object Identication Image Segmentation Speaker Classication etc

External Document copy 2019 Infosys Limited

Table 20 AI Use cases Infosys Research

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

1 Explainable AI (XAI)

2 Fine Grained Classification

5 Transfer Learning

6 Single Shot Learning

3 Capsule Networks

4 Meta Learning

7 Deep Reinforcement Learning (RL)

8 Auto ML

bull httpschristophmgithubiointerpretable-ml-book

bull httpssimmachinescomexplainable-ai

bull httpswwwcmuedunewsstoriesarchives2018octoberexplainable-aihtml

bull httpsmediumcomQuantumBlackmaking-ai-human-again-the-importance-of-explainable-ai-xai-95d347ccbb1c

bull httpstowardsdatasciencecomexplainable-artificial-intelligence-part-2-model-interpretation-strategies-75d4afa6b739

bull httpsvisioncornelleduse3wp-contentuploads201502BMVC14pdf

bull httpswwwfastai20180723auto-ml-3

bull httpsarxivorgpdf160305106pdf

bull httpsarxivorgpdf171009829pdf

bull httpskerasioexamplescifar10_cnn_capsule

bull httpswwwyoutubecomwatchv=pPN8d0E3900

bull httpswwwyoutubecomwatchv=rTawFwUvnLE

bull httpsmediumfreecodecamporgunderstanding-capsule-networks-ais-alluring-new-architecture-bdb228173ddc

bull httpsmediumcomjrodthoughtswhats-new-in-deep-learning-research-openai-s-reptile-makes-it-easier-to-learn-how-to-learn-e0f6651a39f0

bull httpproceedingsmlrpressv48santoro16pdf

bull httpstowardsdatasciencecomwhats-new-in-deep-learning-research-understanding-meta-learning-91fef1295660

bull httpsdeepmindcomblogarticledeep-reinforcement-learning

bull httpsmediumfreecodecamporgan-introduction-to-reinforcement-learning-4339519de419

bull httpsmediumcomjonathan_huialphago-zero-a-game-changer-14ef6e45eba5

bull httpsarxivorgpdf181112560pdf

bull httpswwwml4aadorgautomated-algorithm-designalgorithm-configurationsmac

bull httpswwwfastai20180723auto-ml-3

bull httpswwwfastai20180716auto-ml2auto-ml

bull httpscompetitionscodalaborgcompetitions17767

bull httpswwwautomlorgautomlauto-sklearn

bull httpswwwml4aadorgautomated-algorithm-designalgorithm-configurationsmac

bull httpsautomlgithubioHpBandSterbuildhtmloptimizersbohbhtml

Reference

copy 2019 Infosys Limited Bengaluru India All Rights Reserved Infosys believes the information in this document is accurate as of its publication date such information is subject to change without notice Infosys acknowledges the proprietary rights of other companies to the trademarks product names and such other intellectual property rights mentioned in this document Except as expressly permitted neither this documentation nor any part of it may be reproduced stored in a retrieval system or transmitted in any form or by any means electronic mechanical printing photocopying recording or otherwise without the prior permission of Infosys Limited and or any named intellectual property rights holders under this document

For more information contact askusinfosyscom

Infosyscom | NYSE INFY Stay Connected

9 Neural Architecture Search (NAS)

10 Infosys Enterprise Cognitive Platform

bull httpswwworeillycomideaswhat-is-neural-architecture-search

bull httpswwwinfosyscomservicesincubating-emerging-technologiesofferingsPagesenterprise-cognitive-platformaspx

Sudhanshu Hate is inventor and architect of Infosys Enterprise Cognitive Platform (iECP)

a microservices API based Artificial Intelligence platform He has over 21 years of experience

in creating products solutions and working with clients on industry problems His current

areas of interests are Computer Vision Speech and Unstructured Text based AI possibilities

To know more about our work on the H3 trends in AI write to icetsinfosyscom

About the author

Page 13: H3 Trends in AI Algorithms: The Infosys Way...• Sentiment Analysis Figure 1.0: ... from use cases such as language translations, sentence formulation, text summarization, topic extraction

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

This is a specialized Machine Learning

discipline where an agent learns to behave

in an environment by getting reward or

punishment for the actions performed The

agent can have an objective to maximize

short term or long-term rewards This

discipline uses deep learning techniques to

Value Based

In value-based RL the goal is to optimize

the value function V(s) Qtable uses any

The agent will use this value function to

select which state to choose at each step

Policy Based

In policy-based RL we want to directly

optimize the policy function π(s) without

using a value function

The policy is what defines the agent

behavior at a given time

Deep Reinforcement Learning (RL)

Three Approaches to Reinforcement Learning

bring in human level performance on the

given task

Deep Reinforcement Learning has found

significant relevance and application in

various game design systems such as

creating video games chess alpha Go

Atari as well as in industrial applications of

mathematical function to arrive at a state

based on action

The value of each state is the total amount

There are two types of policies

1 Deterministic A policy which at a given

state will always return the same action

2 Stochastic A policy that outputs a

distribution probability over actions

Value based and Policy based are more

conventional Reinforcement Learning

approaches They are useful for modeling

relatively simple systems

robots driverless car etc

In reinforcement learning policy p

controls what action we should take Value

function v measures how good it is to be

in a particular state The value function

tells us the maximum expected future

reward the agent will get at each state

of the reward an agent can expect to

accumulate over the future starting at that

state

State

State

Q value

Q value action 2

Q value action 1

Q value action 3

Qtable

Deep Q Neuralnetwork

Q learning

Deep Q learning

Action

ExpectedReward discounted

Given that state

action = policy(state)

Figure 110 Schema inspired by the Q learning notebook by Udacity

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

Model Based

In model-based RL we model the

environment This means we create a

model of the behavior of the environment

then this model is used to arrive at results

that maximises short term or long-term

rewards The model equation can be

any equation that is defined based on

the environments behavior and must be

sufficiently generalized to counter new

situations

When Model based approach uses Deep

Neural Network algorithms to sufficiently

well generalize and learn the complexities

of the environment to produce optimal

results it is called Deep Reinforcement

Learning The challenge with model based

approach is each environment needs a

dedicated trained model

AlphaGo was trained by using data from

several games to beat the human being

in the game of Go The training accuracy

was just 57 and still it was sufficient to

beat the human level performance The

training methods involved reinforcement

learning and deep learning to build a

policy network that tells what moves are

promising and a value network that tells

how good the board position is Searches

for the final move from these networks

is done using Monte Carlo Tree Search

(MCTS) algorithm Using supervised

learning a policy network was created to

imitate the expert moves

Deep Mind released AlphaGo Zero in late

2017 which beat AlphaGo and did not

involve any training from previous games

data to train deep network The deep

network training was done by picking

the training samples from AlphaGo and

AlphaGo Zero playing games against

itself and selecting best moves to train

the network and then applying those

in real games to improve the results

iteratively This is possible because deep

reinforcement learning algorithms can

store long-range tree search results for the

next best move in memory and do very

large computations that are difficult for a

human brain

Designing machine learning solution

involves several steps such as collecting

data understanding cleansing and

normalizing data doing feature

engineering selecting or designing

the algorithm selecting the model

architecture selecting and tuning modelrsquos

hyper-parameters evaluating modelrsquos

performance deploying and monitoring

the machine learning system in an online

system and so on Such machine learning

solution design requires an expert Data

Scientist to complete the pipeline

Auto ML (AML)As the complexity of these and other tasks

can easily get overwhelming the rapid

growth of machine learning applications

has created a demand for off-the-shelf

machine learning methods that can be

used easily and without expert knowledge

The AI research area that encompasses

progressive automation of machine

learning pipeline tasks is called AutoML

(Automatic Machine Learning)

Google CEO Sundar Pichai wrote

ldquoDesigning neural nets is extremely time

intensive and requires an expertise that

limits its use to a smaller community of

scientists and engineers Thatrsquos why wersquove

created an approach called AutoML

showing that itrsquos possible for neural nets

to design neural netsrdquo while Googlersquos

Head of AI Jeff Dean suggested that 100x

computational power could replace the

need for machine learning expertise

AutoML Vision relies on two core

techniques transfer learning and neural

architecture search

Xtrain Ytrain

Xtest budget

Han

d-cr

afte

d po

rtfo

lio Meta Learning

AutoML system

Build Ensemble Ytest Data

ProcessorFeature

Preprocessor

Bayesian Optimization

Classier

ML Pipeline

Figure 120 An example of Auto sklearn pipeline Source Andreacute Biedenkapp We did it Again World Champions in AutoML

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

Here is a look at the few libraries that help

in implementing AutoML

AUTO-SKLEARN

AUTO SKLEARN automates several key

tasks in Machine Learning pipeline such

as addressing column missing values

encoding of categorical values data scaling

and normalization feature pre-processing

and selection of right algorithm with

hyper-parameters The pipeline supports

15 Classification and 14 Feature processing

algorithms Selection of right algorithm can

happen based on ensembling techniques

and applying meta knowledge gathered

from executing similar scenarios (datasets

and algorithms)

Usage

Auto-sklearn is written in python and can

be considered as replacement for scikit-

learn classifiers Here is a sample set of

commands

gtgtgt import autosklearnclassification

Implementing AutoML gtgtgt cls = autosklearnclassification

AutoSklearnClassifier()

gtgtgt clsfit(X_train y_train)

gtgtgt predictions = clspredict(X_test

y_test)

SMAC (Sequential Model-Based

Algorithm Configuration)

SMAC is a tool for automating certain

AutoML steps SMAC is useful for selection

of key features hyper-parameter

optimization and to speed up algorithmic

outputs

BOHB (Bayesian Optimization

Hyperband searches)

BOHB combines Bayesian hyper parameter

optimization with bandit methods for

faster convergence

Google H2O also have their respective

AutoML tools which are not covered here

but can be explored in specific cases

AutoML needs significant memory and

computational power to execute alternate

algorithms and compute results At

present GPU resources are extremely

costly to execute even simple Machine

Learning workloads such as CNN algorithm

to classify objects If multiple such

alternate algorithms should be executed

the computation dollar needed would be

exponential This is impractical infeasible

and inefficient for the current state of Data

Science industry Adoption of AutoML will

depend on two things one the maturity

of AutoML pipeline and second but more

important how quickly GPU clusters

become cheap The second being most

critical Selling Cloud GPU capacity could

be one of the motivation of several cloud

based infrastructure-running companies

to promote AutoML in the industry Also

AutoML will not replace the Data scientistrsquos

work but can provide augmentation

and speed to certain tasks such as data

standardization model tuning and

trying multiple algorithms It is only the

beginning for AutoML but this technique

has high relevance and usefulness for

solving ultra-complex problems

External Document copy 2019 Infosys Limited

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

Neural Architecture Search (NAS) is a

component of AutoML and addresses

the important step of designing Neural

Network Architecture

Designing fresh Neural Net architecture

involves an expert establishing and

organizing Neural Network layers filters

or channels filter sizes selecting other

optimum Hyper parameters and so on

through several rounds of computational

Neural Architecture Search (NAS)

iterations Since AlexNet deep neural

network architecture won the ImageNet

(image classification based on ImageNet

dataset) competition in 2012 several

architecture styles such as VGG ResNet

Inception Xception InceptionResNet

MobileNet and NASNet have significantly

evolved However selection of the right

architecture for the right problem is also

a skill due to the presence of various

influencers such as applicability to the

problem accuracy number of parameters

memory and computational footprint and

size of the architecture that govern the

overall functioning efficiency

Neural Architecture Search tries to address

this problem space by automatically

selecting right Neural Network architecture

to solve a given problem

AutoML

HyperparameterOptimization

NAS

External Document copy 2019 Infosys Limited

Figure 130 Source Liam Li Ameet Talwalkar What is neural architecture search

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

Search space The search space provides

boundary within which the specific

architecture needs to be searched

Computer Vision (captioning the scene

or product identification) based use

cases would need a different neural

network architecture style as against

Speech (speech transcription or speaker

classification) or unstructured Text (Topic

extraction intent mining) based use cases

Search space tries to provide available

catalogs of best in class architectures based

on other domain data and performance

Key Components of NAS

These are also usually hand crafted by

expert data scientists

Optimization method This is responsible

for providing mechanism to search the

best architecture It could be searched

and applied randomly or using certain

statistical or Machine Learning evaluation

approach such as Bayesian method or

reinforcement learning methods

Evaluation method This has the role

of evaluating the quality of architecture

considered by optimization method It

could be done using full training approach

or doing partial training and then applying

certain specialized methods such as partial

training or early stopping weights sharing

network morphism etc

For selective problem spaces as

compared to manual methods NAS have

outperformed and is showing definite

promise for future However it is still

evolving and not ready for production

usages as several architectures need to be

established and evaluated depending on

the problem space

Search Space

DAG Representation

Cell Block

Meta-Architecture

NAS Specic

Reinforcement Learning

Evolutionary Search

Gradient-Based Optimization

BayesianOptimization

Optimization Method

Components of NAS

Full Training

Partial Training

Weight-Sharing

Network Morphism

Hypernetworks

Evaluation Method

Figure 140 Components of NAS Source Liam Li Ameet Talwalkar What is neural architecture search

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

In this paper we looked at some key H3 AI

areas by no means this is an exhaustive

list Amongst all discussed Transfer

Learning Capsule Networks Explainable

AI Generative AI are making interesting

Addressing H3 AI Trends at Infosys

things possible and looks highly promising

We are keenly experimenting with these

building early use cases and integrating

into our product stack Infosys Enterprise

Cognitive platform (iECP) to solve

interesting client problems Here is a look

at how we are employing these H3 trends

in the work we do

Trend Use cases

1

2

3

4

5

6

7

8

9

10

Explainable AI (XAI)

Generative AI Neural Style Transfer (NST)

Fine Grained Classication

Capsule Networks

Meta Learning

Transfer Learning

Single Shot Learning

Deep Reinforcement Learning (RL)

Auto ML

Neural Architecture Search (NAS)

Applicable across where results need to be traced eg Tumor Detection Mortgage Rejection Candidate Selection etc

Art Generation Sketch Generation Image or Video Resolution Improvements Data GenerationAugmentation Music Generation

Vehicle Classication Type of Tumor Detection

Image Re-constructionImage ComparisonMatching

Intelligent Agents Continuous Learning scenarios for document review and corrections

Identifying person not wearing helmet Logobrand detection in the image Speech Model training for various accents vocabularies

Face Recognition Face Verication

Intelligent Agents Robots Driverless cars Trac Light Monitoring Continuous Learning scenarios for document review and corrections

Invoice Attribute Extraction Document Classication Document Clustering

CNN or RNN based use cases such as Image Classication Object Identication Image Segmentation Speaker Classication etc

External Document copy 2019 Infosys Limited

Table 20 AI Use cases Infosys Research

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

1 Explainable AI (XAI)

2 Fine Grained Classification

5 Transfer Learning

6 Single Shot Learning

3 Capsule Networks

4 Meta Learning

7 Deep Reinforcement Learning (RL)

8 Auto ML

bull httpschristophmgithubiointerpretable-ml-book

bull httpssimmachinescomexplainable-ai

bull httpswwwcmuedunewsstoriesarchives2018octoberexplainable-aihtml

bull httpsmediumcomQuantumBlackmaking-ai-human-again-the-importance-of-explainable-ai-xai-95d347ccbb1c

bull httpstowardsdatasciencecomexplainable-artificial-intelligence-part-2-model-interpretation-strategies-75d4afa6b739

bull httpsvisioncornelleduse3wp-contentuploads201502BMVC14pdf

bull httpswwwfastai20180723auto-ml-3

bull httpsarxivorgpdf160305106pdf

bull httpsarxivorgpdf171009829pdf

bull httpskerasioexamplescifar10_cnn_capsule

bull httpswwwyoutubecomwatchv=pPN8d0E3900

bull httpswwwyoutubecomwatchv=rTawFwUvnLE

bull httpsmediumfreecodecamporgunderstanding-capsule-networks-ais-alluring-new-architecture-bdb228173ddc

bull httpsmediumcomjrodthoughtswhats-new-in-deep-learning-research-openai-s-reptile-makes-it-easier-to-learn-how-to-learn-e0f6651a39f0

bull httpproceedingsmlrpressv48santoro16pdf

bull httpstowardsdatasciencecomwhats-new-in-deep-learning-research-understanding-meta-learning-91fef1295660

bull httpsdeepmindcomblogarticledeep-reinforcement-learning

bull httpsmediumfreecodecamporgan-introduction-to-reinforcement-learning-4339519de419

bull httpsmediumcomjonathan_huialphago-zero-a-game-changer-14ef6e45eba5

bull httpsarxivorgpdf181112560pdf

bull httpswwwml4aadorgautomated-algorithm-designalgorithm-configurationsmac

bull httpswwwfastai20180723auto-ml-3

bull httpswwwfastai20180716auto-ml2auto-ml

bull httpscompetitionscodalaborgcompetitions17767

bull httpswwwautomlorgautomlauto-sklearn

bull httpswwwml4aadorgautomated-algorithm-designalgorithm-configurationsmac

bull httpsautomlgithubioHpBandSterbuildhtmloptimizersbohbhtml

Reference

copy 2019 Infosys Limited Bengaluru India All Rights Reserved Infosys believes the information in this document is accurate as of its publication date such information is subject to change without notice Infosys acknowledges the proprietary rights of other companies to the trademarks product names and such other intellectual property rights mentioned in this document Except as expressly permitted neither this documentation nor any part of it may be reproduced stored in a retrieval system or transmitted in any form or by any means electronic mechanical printing photocopying recording or otherwise without the prior permission of Infosys Limited and or any named intellectual property rights holders under this document

For more information contact askusinfosyscom

Infosyscom | NYSE INFY Stay Connected

9 Neural Architecture Search (NAS)

10 Infosys Enterprise Cognitive Platform

bull httpswwworeillycomideaswhat-is-neural-architecture-search

bull httpswwwinfosyscomservicesincubating-emerging-technologiesofferingsPagesenterprise-cognitive-platformaspx

Sudhanshu Hate is inventor and architect of Infosys Enterprise Cognitive Platform (iECP)

a microservices API based Artificial Intelligence platform He has over 21 years of experience

in creating products solutions and working with clients on industry problems His current

areas of interests are Computer Vision Speech and Unstructured Text based AI possibilities

To know more about our work on the H3 trends in AI write to icetsinfosyscom

About the author

Page 14: H3 Trends in AI Algorithms: The Infosys Way...• Sentiment Analysis Figure 1.0: ... from use cases such as language translations, sentence formulation, text summarization, topic extraction

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

Model Based

In model-based RL we model the

environment This means we create a

model of the behavior of the environment

then this model is used to arrive at results

that maximises short term or long-term

rewards The model equation can be

any equation that is defined based on

the environments behavior and must be

sufficiently generalized to counter new

situations

When Model based approach uses Deep

Neural Network algorithms to sufficiently

well generalize and learn the complexities

of the environment to produce optimal

results it is called Deep Reinforcement

Learning The challenge with model based

approach is each environment needs a

dedicated trained model

AlphaGo was trained by using data from

several games to beat the human being

in the game of Go The training accuracy

was just 57 and still it was sufficient to

beat the human level performance The

training methods involved reinforcement

learning and deep learning to build a

policy network that tells what moves are

promising and a value network that tells

how good the board position is Searches

for the final move from these networks

is done using Monte Carlo Tree Search

(MCTS) algorithm Using supervised

learning a policy network was created to

imitate the expert moves

Deep Mind released AlphaGo Zero in late

2017 which beat AlphaGo and did not

involve any training from previous games

data to train deep network The deep

network training was done by picking

the training samples from AlphaGo and

AlphaGo Zero playing games against

itself and selecting best moves to train

the network and then applying those

in real games to improve the results

iteratively This is possible because deep

reinforcement learning algorithms can

store long-range tree search results for the

next best move in memory and do very

large computations that are difficult for a

human brain

Designing machine learning solution

involves several steps such as collecting

data understanding cleansing and

normalizing data doing feature

engineering selecting or designing

the algorithm selecting the model

architecture selecting and tuning modelrsquos

hyper-parameters evaluating modelrsquos

performance deploying and monitoring

the machine learning system in an online

system and so on Such machine learning

solution design requires an expert Data

Scientist to complete the pipeline

Auto ML (AML)As the complexity of these and other tasks

can easily get overwhelming the rapid

growth of machine learning applications

has created a demand for off-the-shelf

machine learning methods that can be

used easily and without expert knowledge

The AI research area that encompasses

progressive automation of machine

learning pipeline tasks is called AutoML

(Automatic Machine Learning)

Google CEO Sundar Pichai wrote

ldquoDesigning neural nets is extremely time

intensive and requires an expertise that

limits its use to a smaller community of

scientists and engineers Thatrsquos why wersquove

created an approach called AutoML

showing that itrsquos possible for neural nets

to design neural netsrdquo while Googlersquos

Head of AI Jeff Dean suggested that 100x

computational power could replace the

need for machine learning expertise

AutoML Vision relies on two core

techniques transfer learning and neural

architecture search

Xtrain Ytrain

Xtest budget

Han

d-cr

afte

d po

rtfo

lio Meta Learning

AutoML system

Build Ensemble Ytest Data

ProcessorFeature

Preprocessor

Bayesian Optimization

Classier

ML Pipeline

Figure 120 An example of Auto sklearn pipeline Source Andreacute Biedenkapp We did it Again World Champions in AutoML

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

Here is a look at the few libraries that help

in implementing AutoML

AUTO-SKLEARN

AUTO SKLEARN automates several key

tasks in Machine Learning pipeline such

as addressing column missing values

encoding of categorical values data scaling

and normalization feature pre-processing

and selection of right algorithm with

hyper-parameters The pipeline supports

15 Classification and 14 Feature processing

algorithms Selection of right algorithm can

happen based on ensembling techniques

and applying meta knowledge gathered

from executing similar scenarios (datasets

and algorithms)

Usage

Auto-sklearn is written in python and can

be considered as replacement for scikit-

learn classifiers Here is a sample set of

commands

gtgtgt import autosklearnclassification

Implementing AutoML gtgtgt cls = autosklearnclassification

AutoSklearnClassifier()

gtgtgt clsfit(X_train y_train)

gtgtgt predictions = clspredict(X_test

y_test)

SMAC (Sequential Model-Based

Algorithm Configuration)

SMAC is a tool for automating certain

AutoML steps SMAC is useful for selection

of key features hyper-parameter

optimization and to speed up algorithmic

outputs

BOHB (Bayesian Optimization

Hyperband searches)

BOHB combines Bayesian hyper parameter

optimization with bandit methods for

faster convergence

Google H2O also have their respective

AutoML tools which are not covered here

but can be explored in specific cases

AutoML needs significant memory and

computational power to execute alternate

algorithms and compute results At

present GPU resources are extremely

costly to execute even simple Machine

Learning workloads such as CNN algorithm

to classify objects If multiple such

alternate algorithms should be executed

the computation dollar needed would be

exponential This is impractical infeasible

and inefficient for the current state of Data

Science industry Adoption of AutoML will

depend on two things one the maturity

of AutoML pipeline and second but more

important how quickly GPU clusters

become cheap The second being most

critical Selling Cloud GPU capacity could

be one of the motivation of several cloud

based infrastructure-running companies

to promote AutoML in the industry Also

AutoML will not replace the Data scientistrsquos

work but can provide augmentation

and speed to certain tasks such as data

standardization model tuning and

trying multiple algorithms It is only the

beginning for AutoML but this technique

has high relevance and usefulness for

solving ultra-complex problems

External Document copy 2019 Infosys Limited

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

Neural Architecture Search (NAS) is a

component of AutoML and addresses

the important step of designing Neural

Network Architecture

Designing fresh Neural Net architecture

involves an expert establishing and

organizing Neural Network layers filters

or channels filter sizes selecting other

optimum Hyper parameters and so on

through several rounds of computational

Neural Architecture Search (NAS)

iterations Since AlexNet deep neural

network architecture won the ImageNet

(image classification based on ImageNet

dataset) competition in 2012 several

architecture styles such as VGG ResNet

Inception Xception InceptionResNet

MobileNet and NASNet have significantly

evolved However selection of the right

architecture for the right problem is also

a skill due to the presence of various

influencers such as applicability to the

problem accuracy number of parameters

memory and computational footprint and

size of the architecture that govern the

overall functioning efficiency

Neural Architecture Search tries to address

this problem space by automatically

selecting right Neural Network architecture

to solve a given problem

AutoML

HyperparameterOptimization

NAS

External Document copy 2019 Infosys Limited

Figure 130 Source Liam Li Ameet Talwalkar What is neural architecture search

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

Search space The search space provides

boundary within which the specific

architecture needs to be searched

Computer Vision (captioning the scene

or product identification) based use

cases would need a different neural

network architecture style as against

Speech (speech transcription or speaker

classification) or unstructured Text (Topic

extraction intent mining) based use cases

Search space tries to provide available

catalogs of best in class architectures based

on other domain data and performance

Key Components of NAS

These are also usually hand crafted by

expert data scientists

Optimization method This is responsible

for providing mechanism to search the

best architecture It could be searched

and applied randomly or using certain

statistical or Machine Learning evaluation

approach such as Bayesian method or

reinforcement learning methods

Evaluation method This has the role

of evaluating the quality of architecture

considered by optimization method It

could be done using full training approach

or doing partial training and then applying

certain specialized methods such as partial

training or early stopping weights sharing

network morphism etc

For selective problem spaces as

compared to manual methods NAS have

outperformed and is showing definite

promise for future However it is still

evolving and not ready for production

usages as several architectures need to be

established and evaluated depending on

the problem space

Search Space

DAG Representation

Cell Block

Meta-Architecture

NAS Specic

Reinforcement Learning

Evolutionary Search

Gradient-Based Optimization

BayesianOptimization

Optimization Method

Components of NAS

Full Training

Partial Training

Weight-Sharing

Network Morphism

Hypernetworks

Evaluation Method

Figure 140 Components of NAS Source Liam Li Ameet Talwalkar What is neural architecture search

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

In this paper we looked at some key H3 AI

areas by no means this is an exhaustive

list Amongst all discussed Transfer

Learning Capsule Networks Explainable

AI Generative AI are making interesting

Addressing H3 AI Trends at Infosys

things possible and looks highly promising

We are keenly experimenting with these

building early use cases and integrating

into our product stack Infosys Enterprise

Cognitive platform (iECP) to solve

interesting client problems Here is a look

at how we are employing these H3 trends

in the work we do

Trend Use cases

1

2

3

4

5

6

7

8

9

10

Explainable AI (XAI)

Generative AI Neural Style Transfer (NST)

Fine Grained Classication

Capsule Networks

Meta Learning

Transfer Learning

Single Shot Learning

Deep Reinforcement Learning (RL)

Auto ML

Neural Architecture Search (NAS)

Applicable across where results need to be traced eg Tumor Detection Mortgage Rejection Candidate Selection etc

Art Generation Sketch Generation Image or Video Resolution Improvements Data GenerationAugmentation Music Generation

Vehicle Classication Type of Tumor Detection

Image Re-constructionImage ComparisonMatching

Intelligent Agents Continuous Learning scenarios for document review and corrections

Identifying person not wearing helmet Logobrand detection in the image Speech Model training for various accents vocabularies

Face Recognition Face Verication

Intelligent Agents Robots Driverless cars Trac Light Monitoring Continuous Learning scenarios for document review and corrections

Invoice Attribute Extraction Document Classication Document Clustering

CNN or RNN based use cases such as Image Classication Object Identication Image Segmentation Speaker Classication etc

External Document copy 2019 Infosys Limited

Table 20 AI Use cases Infosys Research

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

1 Explainable AI (XAI)

2 Fine Grained Classification

5 Transfer Learning

6 Single Shot Learning

3 Capsule Networks

4 Meta Learning

7 Deep Reinforcement Learning (RL)

8 Auto ML

bull httpschristophmgithubiointerpretable-ml-book

bull httpssimmachinescomexplainable-ai

bull httpswwwcmuedunewsstoriesarchives2018octoberexplainable-aihtml

bull httpsmediumcomQuantumBlackmaking-ai-human-again-the-importance-of-explainable-ai-xai-95d347ccbb1c

bull httpstowardsdatasciencecomexplainable-artificial-intelligence-part-2-model-interpretation-strategies-75d4afa6b739

bull httpsvisioncornelleduse3wp-contentuploads201502BMVC14pdf

bull httpswwwfastai20180723auto-ml-3

bull httpsarxivorgpdf160305106pdf

bull httpsarxivorgpdf171009829pdf

bull httpskerasioexamplescifar10_cnn_capsule

bull httpswwwyoutubecomwatchv=pPN8d0E3900

bull httpswwwyoutubecomwatchv=rTawFwUvnLE

bull httpsmediumfreecodecamporgunderstanding-capsule-networks-ais-alluring-new-architecture-bdb228173ddc

bull httpsmediumcomjrodthoughtswhats-new-in-deep-learning-research-openai-s-reptile-makes-it-easier-to-learn-how-to-learn-e0f6651a39f0

bull httpproceedingsmlrpressv48santoro16pdf

bull httpstowardsdatasciencecomwhats-new-in-deep-learning-research-understanding-meta-learning-91fef1295660

bull httpsdeepmindcomblogarticledeep-reinforcement-learning

bull httpsmediumfreecodecamporgan-introduction-to-reinforcement-learning-4339519de419

bull httpsmediumcomjonathan_huialphago-zero-a-game-changer-14ef6e45eba5

bull httpsarxivorgpdf181112560pdf

bull httpswwwml4aadorgautomated-algorithm-designalgorithm-configurationsmac

bull httpswwwfastai20180723auto-ml-3

bull httpswwwfastai20180716auto-ml2auto-ml

bull httpscompetitionscodalaborgcompetitions17767

bull httpswwwautomlorgautomlauto-sklearn

bull httpswwwml4aadorgautomated-algorithm-designalgorithm-configurationsmac

bull httpsautomlgithubioHpBandSterbuildhtmloptimizersbohbhtml

Reference

copy 2019 Infosys Limited Bengaluru India All Rights Reserved Infosys believes the information in this document is accurate as of its publication date such information is subject to change without notice Infosys acknowledges the proprietary rights of other companies to the trademarks product names and such other intellectual property rights mentioned in this document Except as expressly permitted neither this documentation nor any part of it may be reproduced stored in a retrieval system or transmitted in any form or by any means electronic mechanical printing photocopying recording or otherwise without the prior permission of Infosys Limited and or any named intellectual property rights holders under this document

For more information contact askusinfosyscom

Infosyscom | NYSE INFY Stay Connected

9 Neural Architecture Search (NAS)

10 Infosys Enterprise Cognitive Platform

bull httpswwworeillycomideaswhat-is-neural-architecture-search

bull httpswwwinfosyscomservicesincubating-emerging-technologiesofferingsPagesenterprise-cognitive-platformaspx

Sudhanshu Hate is inventor and architect of Infosys Enterprise Cognitive Platform (iECP)

a microservices API based Artificial Intelligence platform He has over 21 years of experience

in creating products solutions and working with clients on industry problems His current

areas of interests are Computer Vision Speech and Unstructured Text based AI possibilities

To know more about our work on the H3 trends in AI write to icetsinfosyscom

About the author

Page 15: H3 Trends in AI Algorithms: The Infosys Way...• Sentiment Analysis Figure 1.0: ... from use cases such as language translations, sentence formulation, text summarization, topic extraction

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

Here is a look at the few libraries that help

in implementing AutoML

AUTO-SKLEARN

AUTO SKLEARN automates several key

tasks in Machine Learning pipeline such

as addressing column missing values

encoding of categorical values data scaling

and normalization feature pre-processing

and selection of right algorithm with

hyper-parameters The pipeline supports

15 Classification and 14 Feature processing

algorithms Selection of right algorithm can

happen based on ensembling techniques

and applying meta knowledge gathered

from executing similar scenarios (datasets

and algorithms)

Usage

Auto-sklearn is written in python and can

be considered as replacement for scikit-

learn classifiers Here is a sample set of

commands

gtgtgt import autosklearnclassification

Implementing AutoML gtgtgt cls = autosklearnclassification

AutoSklearnClassifier()

gtgtgt clsfit(X_train y_train)

gtgtgt predictions = clspredict(X_test

y_test)

SMAC (Sequential Model-Based

Algorithm Configuration)

SMAC is a tool for automating certain

AutoML steps SMAC is useful for selection

of key features hyper-parameter

optimization and to speed up algorithmic

outputs

BOHB (Bayesian Optimization

Hyperband searches)

BOHB combines Bayesian hyper parameter

optimization with bandit methods for

faster convergence

Google H2O also have their respective

AutoML tools which are not covered here

but can be explored in specific cases

AutoML needs significant memory and

computational power to execute alternate

algorithms and compute results At

present GPU resources are extremely

costly to execute even simple Machine

Learning workloads such as CNN algorithm

to classify objects If multiple such

alternate algorithms should be executed

the computation dollar needed would be

exponential This is impractical infeasible

and inefficient for the current state of Data

Science industry Adoption of AutoML will

depend on two things one the maturity

of AutoML pipeline and second but more

important how quickly GPU clusters

become cheap The second being most

critical Selling Cloud GPU capacity could

be one of the motivation of several cloud

based infrastructure-running companies

to promote AutoML in the industry Also

AutoML will not replace the Data scientistrsquos

work but can provide augmentation

and speed to certain tasks such as data

standardization model tuning and

trying multiple algorithms It is only the

beginning for AutoML but this technique

has high relevance and usefulness for

solving ultra-complex problems

External Document copy 2019 Infosys Limited

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

Neural Architecture Search (NAS) is a

component of AutoML and addresses

the important step of designing Neural

Network Architecture

Designing fresh Neural Net architecture

involves an expert establishing and

organizing Neural Network layers filters

or channels filter sizes selecting other

optimum Hyper parameters and so on

through several rounds of computational

Neural Architecture Search (NAS)

iterations Since AlexNet deep neural

network architecture won the ImageNet

(image classification based on ImageNet

dataset) competition in 2012 several

architecture styles such as VGG ResNet

Inception Xception InceptionResNet

MobileNet and NASNet have significantly

evolved However selection of the right

architecture for the right problem is also

a skill due to the presence of various

influencers such as applicability to the

problem accuracy number of parameters

memory and computational footprint and

size of the architecture that govern the

overall functioning efficiency

Neural Architecture Search tries to address

this problem space by automatically

selecting right Neural Network architecture

to solve a given problem

AutoML

HyperparameterOptimization

NAS

External Document copy 2019 Infosys Limited

Figure 130 Source Liam Li Ameet Talwalkar What is neural architecture search

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

Search space The search space provides

boundary within which the specific

architecture needs to be searched

Computer Vision (captioning the scene

or product identification) based use

cases would need a different neural

network architecture style as against

Speech (speech transcription or speaker

classification) or unstructured Text (Topic

extraction intent mining) based use cases

Search space tries to provide available

catalogs of best in class architectures based

on other domain data and performance

Key Components of NAS

These are also usually hand crafted by

expert data scientists

Optimization method This is responsible

for providing mechanism to search the

best architecture It could be searched

and applied randomly or using certain

statistical or Machine Learning evaluation

approach such as Bayesian method or

reinforcement learning methods

Evaluation method This has the role

of evaluating the quality of architecture

considered by optimization method It

could be done using full training approach

or doing partial training and then applying

certain specialized methods such as partial

training or early stopping weights sharing

network morphism etc

For selective problem spaces as

compared to manual methods NAS have

outperformed and is showing definite

promise for future However it is still

evolving and not ready for production

usages as several architectures need to be

established and evaluated depending on

the problem space

Search Space

DAG Representation

Cell Block

Meta-Architecture

NAS Specic

Reinforcement Learning

Evolutionary Search

Gradient-Based Optimization

BayesianOptimization

Optimization Method

Components of NAS

Full Training

Partial Training

Weight-Sharing

Network Morphism

Hypernetworks

Evaluation Method

Figure 140 Components of NAS Source Liam Li Ameet Talwalkar What is neural architecture search

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

In this paper we looked at some key H3 AI

areas by no means this is an exhaustive

list Amongst all discussed Transfer

Learning Capsule Networks Explainable

AI Generative AI are making interesting

Addressing H3 AI Trends at Infosys

things possible and looks highly promising

We are keenly experimenting with these

building early use cases and integrating

into our product stack Infosys Enterprise

Cognitive platform (iECP) to solve

interesting client problems Here is a look

at how we are employing these H3 trends

in the work we do

Trend Use cases

1

2

3

4

5

6

7

8

9

10

Explainable AI (XAI)

Generative AI Neural Style Transfer (NST)

Fine Grained Classication

Capsule Networks

Meta Learning

Transfer Learning

Single Shot Learning

Deep Reinforcement Learning (RL)

Auto ML

Neural Architecture Search (NAS)

Applicable across where results need to be traced eg Tumor Detection Mortgage Rejection Candidate Selection etc

Art Generation Sketch Generation Image or Video Resolution Improvements Data GenerationAugmentation Music Generation

Vehicle Classication Type of Tumor Detection

Image Re-constructionImage ComparisonMatching

Intelligent Agents Continuous Learning scenarios for document review and corrections

Identifying person not wearing helmet Logobrand detection in the image Speech Model training for various accents vocabularies

Face Recognition Face Verication

Intelligent Agents Robots Driverless cars Trac Light Monitoring Continuous Learning scenarios for document review and corrections

Invoice Attribute Extraction Document Classication Document Clustering

CNN or RNN based use cases such as Image Classication Object Identication Image Segmentation Speaker Classication etc

External Document copy 2019 Infosys Limited

Table 20 AI Use cases Infosys Research

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

1 Explainable AI (XAI)

2 Fine Grained Classification

5 Transfer Learning

6 Single Shot Learning

3 Capsule Networks

4 Meta Learning

7 Deep Reinforcement Learning (RL)

8 Auto ML

bull httpschristophmgithubiointerpretable-ml-book

bull httpssimmachinescomexplainable-ai

bull httpswwwcmuedunewsstoriesarchives2018octoberexplainable-aihtml

bull httpsmediumcomQuantumBlackmaking-ai-human-again-the-importance-of-explainable-ai-xai-95d347ccbb1c

bull httpstowardsdatasciencecomexplainable-artificial-intelligence-part-2-model-interpretation-strategies-75d4afa6b739

bull httpsvisioncornelleduse3wp-contentuploads201502BMVC14pdf

bull httpswwwfastai20180723auto-ml-3

bull httpsarxivorgpdf160305106pdf

bull httpsarxivorgpdf171009829pdf

bull httpskerasioexamplescifar10_cnn_capsule

bull httpswwwyoutubecomwatchv=pPN8d0E3900

bull httpswwwyoutubecomwatchv=rTawFwUvnLE

bull httpsmediumfreecodecamporgunderstanding-capsule-networks-ais-alluring-new-architecture-bdb228173ddc

bull httpsmediumcomjrodthoughtswhats-new-in-deep-learning-research-openai-s-reptile-makes-it-easier-to-learn-how-to-learn-e0f6651a39f0

bull httpproceedingsmlrpressv48santoro16pdf

bull httpstowardsdatasciencecomwhats-new-in-deep-learning-research-understanding-meta-learning-91fef1295660

bull httpsdeepmindcomblogarticledeep-reinforcement-learning

bull httpsmediumfreecodecamporgan-introduction-to-reinforcement-learning-4339519de419

bull httpsmediumcomjonathan_huialphago-zero-a-game-changer-14ef6e45eba5

bull httpsarxivorgpdf181112560pdf

bull httpswwwml4aadorgautomated-algorithm-designalgorithm-configurationsmac

bull httpswwwfastai20180723auto-ml-3

bull httpswwwfastai20180716auto-ml2auto-ml

bull httpscompetitionscodalaborgcompetitions17767

bull httpswwwautomlorgautomlauto-sklearn

bull httpswwwml4aadorgautomated-algorithm-designalgorithm-configurationsmac

bull httpsautomlgithubioHpBandSterbuildhtmloptimizersbohbhtml

Reference

copy 2019 Infosys Limited Bengaluru India All Rights Reserved Infosys believes the information in this document is accurate as of its publication date such information is subject to change without notice Infosys acknowledges the proprietary rights of other companies to the trademarks product names and such other intellectual property rights mentioned in this document Except as expressly permitted neither this documentation nor any part of it may be reproduced stored in a retrieval system or transmitted in any form or by any means electronic mechanical printing photocopying recording or otherwise without the prior permission of Infosys Limited and or any named intellectual property rights holders under this document

For more information contact askusinfosyscom

Infosyscom | NYSE INFY Stay Connected

9 Neural Architecture Search (NAS)

10 Infosys Enterprise Cognitive Platform

bull httpswwworeillycomideaswhat-is-neural-architecture-search

bull httpswwwinfosyscomservicesincubating-emerging-technologiesofferingsPagesenterprise-cognitive-platformaspx

Sudhanshu Hate is inventor and architect of Infosys Enterprise Cognitive Platform (iECP)

a microservices API based Artificial Intelligence platform He has over 21 years of experience

in creating products solutions and working with clients on industry problems His current

areas of interests are Computer Vision Speech and Unstructured Text based AI possibilities

To know more about our work on the H3 trends in AI write to icetsinfosyscom

About the author

Page 16: H3 Trends in AI Algorithms: The Infosys Way...• Sentiment Analysis Figure 1.0: ... from use cases such as language translations, sentence formulation, text summarization, topic extraction

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

Neural Architecture Search (NAS) is a

component of AutoML and addresses

the important step of designing Neural

Network Architecture

Designing fresh Neural Net architecture

involves an expert establishing and

organizing Neural Network layers filters

or channels filter sizes selecting other

optimum Hyper parameters and so on

through several rounds of computational

Neural Architecture Search (NAS)

iterations Since AlexNet deep neural

network architecture won the ImageNet

(image classification based on ImageNet

dataset) competition in 2012 several

architecture styles such as VGG ResNet

Inception Xception InceptionResNet

MobileNet and NASNet have significantly

evolved However selection of the right

architecture for the right problem is also

a skill due to the presence of various

influencers such as applicability to the

problem accuracy number of parameters

memory and computational footprint and

size of the architecture that govern the

overall functioning efficiency

Neural Architecture Search tries to address

this problem space by automatically

selecting right Neural Network architecture

to solve a given problem

AutoML

HyperparameterOptimization

NAS

External Document copy 2019 Infosys Limited

Figure 130 Source Liam Li Ameet Talwalkar What is neural architecture search

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

Search space The search space provides

boundary within which the specific

architecture needs to be searched

Computer Vision (captioning the scene

or product identification) based use

cases would need a different neural

network architecture style as against

Speech (speech transcription or speaker

classification) or unstructured Text (Topic

extraction intent mining) based use cases

Search space tries to provide available

catalogs of best in class architectures based

on other domain data and performance

Key Components of NAS

These are also usually hand crafted by

expert data scientists

Optimization method This is responsible

for providing mechanism to search the

best architecture It could be searched

and applied randomly or using certain

statistical or Machine Learning evaluation

approach such as Bayesian method or

reinforcement learning methods

Evaluation method This has the role

of evaluating the quality of architecture

considered by optimization method It

could be done using full training approach

or doing partial training and then applying

certain specialized methods such as partial

training or early stopping weights sharing

network morphism etc

For selective problem spaces as

compared to manual methods NAS have

outperformed and is showing definite

promise for future However it is still

evolving and not ready for production

usages as several architectures need to be

established and evaluated depending on

the problem space

Search Space

DAG Representation

Cell Block

Meta-Architecture

NAS Specic

Reinforcement Learning

Evolutionary Search

Gradient-Based Optimization

BayesianOptimization

Optimization Method

Components of NAS

Full Training

Partial Training

Weight-Sharing

Network Morphism

Hypernetworks

Evaluation Method

Figure 140 Components of NAS Source Liam Li Ameet Talwalkar What is neural architecture search

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

In this paper we looked at some key H3 AI

areas by no means this is an exhaustive

list Amongst all discussed Transfer

Learning Capsule Networks Explainable

AI Generative AI are making interesting

Addressing H3 AI Trends at Infosys

things possible and looks highly promising

We are keenly experimenting with these

building early use cases and integrating

into our product stack Infosys Enterprise

Cognitive platform (iECP) to solve

interesting client problems Here is a look

at how we are employing these H3 trends

in the work we do

Trend Use cases

1

2

3

4

5

6

7

8

9

10

Explainable AI (XAI)

Generative AI Neural Style Transfer (NST)

Fine Grained Classication

Capsule Networks

Meta Learning

Transfer Learning

Single Shot Learning

Deep Reinforcement Learning (RL)

Auto ML

Neural Architecture Search (NAS)

Applicable across where results need to be traced eg Tumor Detection Mortgage Rejection Candidate Selection etc

Art Generation Sketch Generation Image or Video Resolution Improvements Data GenerationAugmentation Music Generation

Vehicle Classication Type of Tumor Detection

Image Re-constructionImage ComparisonMatching

Intelligent Agents Continuous Learning scenarios for document review and corrections

Identifying person not wearing helmet Logobrand detection in the image Speech Model training for various accents vocabularies

Face Recognition Face Verication

Intelligent Agents Robots Driverless cars Trac Light Monitoring Continuous Learning scenarios for document review and corrections

Invoice Attribute Extraction Document Classication Document Clustering

CNN or RNN based use cases such as Image Classication Object Identication Image Segmentation Speaker Classication etc

External Document copy 2019 Infosys Limited

Table 20 AI Use cases Infosys Research

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

1 Explainable AI (XAI)

2 Fine Grained Classification

5 Transfer Learning

6 Single Shot Learning

3 Capsule Networks

4 Meta Learning

7 Deep Reinforcement Learning (RL)

8 Auto ML

bull httpschristophmgithubiointerpretable-ml-book

bull httpssimmachinescomexplainable-ai

bull httpswwwcmuedunewsstoriesarchives2018octoberexplainable-aihtml

bull httpsmediumcomQuantumBlackmaking-ai-human-again-the-importance-of-explainable-ai-xai-95d347ccbb1c

bull httpstowardsdatasciencecomexplainable-artificial-intelligence-part-2-model-interpretation-strategies-75d4afa6b739

bull httpsvisioncornelleduse3wp-contentuploads201502BMVC14pdf

bull httpswwwfastai20180723auto-ml-3

bull httpsarxivorgpdf160305106pdf

bull httpsarxivorgpdf171009829pdf

bull httpskerasioexamplescifar10_cnn_capsule

bull httpswwwyoutubecomwatchv=pPN8d0E3900

bull httpswwwyoutubecomwatchv=rTawFwUvnLE

bull httpsmediumfreecodecamporgunderstanding-capsule-networks-ais-alluring-new-architecture-bdb228173ddc

bull httpsmediumcomjrodthoughtswhats-new-in-deep-learning-research-openai-s-reptile-makes-it-easier-to-learn-how-to-learn-e0f6651a39f0

bull httpproceedingsmlrpressv48santoro16pdf

bull httpstowardsdatasciencecomwhats-new-in-deep-learning-research-understanding-meta-learning-91fef1295660

bull httpsdeepmindcomblogarticledeep-reinforcement-learning

bull httpsmediumfreecodecamporgan-introduction-to-reinforcement-learning-4339519de419

bull httpsmediumcomjonathan_huialphago-zero-a-game-changer-14ef6e45eba5

bull httpsarxivorgpdf181112560pdf

bull httpswwwml4aadorgautomated-algorithm-designalgorithm-configurationsmac

bull httpswwwfastai20180723auto-ml-3

bull httpswwwfastai20180716auto-ml2auto-ml

bull httpscompetitionscodalaborgcompetitions17767

bull httpswwwautomlorgautomlauto-sklearn

bull httpswwwml4aadorgautomated-algorithm-designalgorithm-configurationsmac

bull httpsautomlgithubioHpBandSterbuildhtmloptimizersbohbhtml

Reference

copy 2019 Infosys Limited Bengaluru India All Rights Reserved Infosys believes the information in this document is accurate as of its publication date such information is subject to change without notice Infosys acknowledges the proprietary rights of other companies to the trademarks product names and such other intellectual property rights mentioned in this document Except as expressly permitted neither this documentation nor any part of it may be reproduced stored in a retrieval system or transmitted in any form or by any means electronic mechanical printing photocopying recording or otherwise without the prior permission of Infosys Limited and or any named intellectual property rights holders under this document

For more information contact askusinfosyscom

Infosyscom | NYSE INFY Stay Connected

9 Neural Architecture Search (NAS)

10 Infosys Enterprise Cognitive Platform

bull httpswwworeillycomideaswhat-is-neural-architecture-search

bull httpswwwinfosyscomservicesincubating-emerging-technologiesofferingsPagesenterprise-cognitive-platformaspx

Sudhanshu Hate is inventor and architect of Infosys Enterprise Cognitive Platform (iECP)

a microservices API based Artificial Intelligence platform He has over 21 years of experience

in creating products solutions and working with clients on industry problems His current

areas of interests are Computer Vision Speech and Unstructured Text based AI possibilities

To know more about our work on the H3 trends in AI write to icetsinfosyscom

About the author

Page 17: H3 Trends in AI Algorithms: The Infosys Way...• Sentiment Analysis Figure 1.0: ... from use cases such as language translations, sentence formulation, text summarization, topic extraction

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

Search space The search space provides

boundary within which the specific

architecture needs to be searched

Computer Vision (captioning the scene

or product identification) based use

cases would need a different neural

network architecture style as against

Speech (speech transcription or speaker

classification) or unstructured Text (Topic

extraction intent mining) based use cases

Search space tries to provide available

catalogs of best in class architectures based

on other domain data and performance

Key Components of NAS

These are also usually hand crafted by

expert data scientists

Optimization method This is responsible

for providing mechanism to search the

best architecture It could be searched

and applied randomly or using certain

statistical or Machine Learning evaluation

approach such as Bayesian method or

reinforcement learning methods

Evaluation method This has the role

of evaluating the quality of architecture

considered by optimization method It

could be done using full training approach

or doing partial training and then applying

certain specialized methods such as partial

training or early stopping weights sharing

network morphism etc

For selective problem spaces as

compared to manual methods NAS have

outperformed and is showing definite

promise for future However it is still

evolving and not ready for production

usages as several architectures need to be

established and evaluated depending on

the problem space

Search Space

DAG Representation

Cell Block

Meta-Architecture

NAS Specic

Reinforcement Learning

Evolutionary Search

Gradient-Based Optimization

BayesianOptimization

Optimization Method

Components of NAS

Full Training

Partial Training

Weight-Sharing

Network Morphism

Hypernetworks

Evaluation Method

Figure 140 Components of NAS Source Liam Li Ameet Talwalkar What is neural architecture search

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

In this paper we looked at some key H3 AI

areas by no means this is an exhaustive

list Amongst all discussed Transfer

Learning Capsule Networks Explainable

AI Generative AI are making interesting

Addressing H3 AI Trends at Infosys

things possible and looks highly promising

We are keenly experimenting with these

building early use cases and integrating

into our product stack Infosys Enterprise

Cognitive platform (iECP) to solve

interesting client problems Here is a look

at how we are employing these H3 trends

in the work we do

Trend Use cases

1

2

3

4

5

6

7

8

9

10

Explainable AI (XAI)

Generative AI Neural Style Transfer (NST)

Fine Grained Classication

Capsule Networks

Meta Learning

Transfer Learning

Single Shot Learning

Deep Reinforcement Learning (RL)

Auto ML

Neural Architecture Search (NAS)

Applicable across where results need to be traced eg Tumor Detection Mortgage Rejection Candidate Selection etc

Art Generation Sketch Generation Image or Video Resolution Improvements Data GenerationAugmentation Music Generation

Vehicle Classication Type of Tumor Detection

Image Re-constructionImage ComparisonMatching

Intelligent Agents Continuous Learning scenarios for document review and corrections

Identifying person not wearing helmet Logobrand detection in the image Speech Model training for various accents vocabularies

Face Recognition Face Verication

Intelligent Agents Robots Driverless cars Trac Light Monitoring Continuous Learning scenarios for document review and corrections

Invoice Attribute Extraction Document Classication Document Clustering

CNN or RNN based use cases such as Image Classication Object Identication Image Segmentation Speaker Classication etc

External Document copy 2019 Infosys Limited

Table 20 AI Use cases Infosys Research

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

1 Explainable AI (XAI)

2 Fine Grained Classification

5 Transfer Learning

6 Single Shot Learning

3 Capsule Networks

4 Meta Learning

7 Deep Reinforcement Learning (RL)

8 Auto ML

bull httpschristophmgithubiointerpretable-ml-book

bull httpssimmachinescomexplainable-ai

bull httpswwwcmuedunewsstoriesarchives2018octoberexplainable-aihtml

bull httpsmediumcomQuantumBlackmaking-ai-human-again-the-importance-of-explainable-ai-xai-95d347ccbb1c

bull httpstowardsdatasciencecomexplainable-artificial-intelligence-part-2-model-interpretation-strategies-75d4afa6b739

bull httpsvisioncornelleduse3wp-contentuploads201502BMVC14pdf

bull httpswwwfastai20180723auto-ml-3

bull httpsarxivorgpdf160305106pdf

bull httpsarxivorgpdf171009829pdf

bull httpskerasioexamplescifar10_cnn_capsule

bull httpswwwyoutubecomwatchv=pPN8d0E3900

bull httpswwwyoutubecomwatchv=rTawFwUvnLE

bull httpsmediumfreecodecamporgunderstanding-capsule-networks-ais-alluring-new-architecture-bdb228173ddc

bull httpsmediumcomjrodthoughtswhats-new-in-deep-learning-research-openai-s-reptile-makes-it-easier-to-learn-how-to-learn-e0f6651a39f0

bull httpproceedingsmlrpressv48santoro16pdf

bull httpstowardsdatasciencecomwhats-new-in-deep-learning-research-understanding-meta-learning-91fef1295660

bull httpsdeepmindcomblogarticledeep-reinforcement-learning

bull httpsmediumfreecodecamporgan-introduction-to-reinforcement-learning-4339519de419

bull httpsmediumcomjonathan_huialphago-zero-a-game-changer-14ef6e45eba5

bull httpsarxivorgpdf181112560pdf

bull httpswwwml4aadorgautomated-algorithm-designalgorithm-configurationsmac

bull httpswwwfastai20180723auto-ml-3

bull httpswwwfastai20180716auto-ml2auto-ml

bull httpscompetitionscodalaborgcompetitions17767

bull httpswwwautomlorgautomlauto-sklearn

bull httpswwwml4aadorgautomated-algorithm-designalgorithm-configurationsmac

bull httpsautomlgithubioHpBandSterbuildhtmloptimizersbohbhtml

Reference

copy 2019 Infosys Limited Bengaluru India All Rights Reserved Infosys believes the information in this document is accurate as of its publication date such information is subject to change without notice Infosys acknowledges the proprietary rights of other companies to the trademarks product names and such other intellectual property rights mentioned in this document Except as expressly permitted neither this documentation nor any part of it may be reproduced stored in a retrieval system or transmitted in any form or by any means electronic mechanical printing photocopying recording or otherwise without the prior permission of Infosys Limited and or any named intellectual property rights holders under this document

For more information contact askusinfosyscom

Infosyscom | NYSE INFY Stay Connected

9 Neural Architecture Search (NAS)

10 Infosys Enterprise Cognitive Platform

bull httpswwworeillycomideaswhat-is-neural-architecture-search

bull httpswwwinfosyscomservicesincubating-emerging-technologiesofferingsPagesenterprise-cognitive-platformaspx

Sudhanshu Hate is inventor and architect of Infosys Enterprise Cognitive Platform (iECP)

a microservices API based Artificial Intelligence platform He has over 21 years of experience

in creating products solutions and working with clients on industry problems His current

areas of interests are Computer Vision Speech and Unstructured Text based AI possibilities

To know more about our work on the H3 trends in AI write to icetsinfosyscom

About the author

Page 18: H3 Trends in AI Algorithms: The Infosys Way...• Sentiment Analysis Figure 1.0: ... from use cases such as language translations, sentence formulation, text summarization, topic extraction

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

In this paper we looked at some key H3 AI

areas by no means this is an exhaustive

list Amongst all discussed Transfer

Learning Capsule Networks Explainable

AI Generative AI are making interesting

Addressing H3 AI Trends at Infosys

things possible and looks highly promising

We are keenly experimenting with these

building early use cases and integrating

into our product stack Infosys Enterprise

Cognitive platform (iECP) to solve

interesting client problems Here is a look

at how we are employing these H3 trends

in the work we do

Trend Use cases

1

2

3

4

5

6

7

8

9

10

Explainable AI (XAI)

Generative AI Neural Style Transfer (NST)

Fine Grained Classication

Capsule Networks

Meta Learning

Transfer Learning

Single Shot Learning

Deep Reinforcement Learning (RL)

Auto ML

Neural Architecture Search (NAS)

Applicable across where results need to be traced eg Tumor Detection Mortgage Rejection Candidate Selection etc

Art Generation Sketch Generation Image or Video Resolution Improvements Data GenerationAugmentation Music Generation

Vehicle Classication Type of Tumor Detection

Image Re-constructionImage ComparisonMatching

Intelligent Agents Continuous Learning scenarios for document review and corrections

Identifying person not wearing helmet Logobrand detection in the image Speech Model training for various accents vocabularies

Face Recognition Face Verication

Intelligent Agents Robots Driverless cars Trac Light Monitoring Continuous Learning scenarios for document review and corrections

Invoice Attribute Extraction Document Classication Document Clustering

CNN or RNN based use cases such as Image Classication Object Identication Image Segmentation Speaker Classication etc

External Document copy 2019 Infosys Limited

Table 20 AI Use cases Infosys Research

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

1 Explainable AI (XAI)

2 Fine Grained Classification

5 Transfer Learning

6 Single Shot Learning

3 Capsule Networks

4 Meta Learning

7 Deep Reinforcement Learning (RL)

8 Auto ML

bull httpschristophmgithubiointerpretable-ml-book

bull httpssimmachinescomexplainable-ai

bull httpswwwcmuedunewsstoriesarchives2018octoberexplainable-aihtml

bull httpsmediumcomQuantumBlackmaking-ai-human-again-the-importance-of-explainable-ai-xai-95d347ccbb1c

bull httpstowardsdatasciencecomexplainable-artificial-intelligence-part-2-model-interpretation-strategies-75d4afa6b739

bull httpsvisioncornelleduse3wp-contentuploads201502BMVC14pdf

bull httpswwwfastai20180723auto-ml-3

bull httpsarxivorgpdf160305106pdf

bull httpsarxivorgpdf171009829pdf

bull httpskerasioexamplescifar10_cnn_capsule

bull httpswwwyoutubecomwatchv=pPN8d0E3900

bull httpswwwyoutubecomwatchv=rTawFwUvnLE

bull httpsmediumfreecodecamporgunderstanding-capsule-networks-ais-alluring-new-architecture-bdb228173ddc

bull httpsmediumcomjrodthoughtswhats-new-in-deep-learning-research-openai-s-reptile-makes-it-easier-to-learn-how-to-learn-e0f6651a39f0

bull httpproceedingsmlrpressv48santoro16pdf

bull httpstowardsdatasciencecomwhats-new-in-deep-learning-research-understanding-meta-learning-91fef1295660

bull httpsdeepmindcomblogarticledeep-reinforcement-learning

bull httpsmediumfreecodecamporgan-introduction-to-reinforcement-learning-4339519de419

bull httpsmediumcomjonathan_huialphago-zero-a-game-changer-14ef6e45eba5

bull httpsarxivorgpdf181112560pdf

bull httpswwwml4aadorgautomated-algorithm-designalgorithm-configurationsmac

bull httpswwwfastai20180723auto-ml-3

bull httpswwwfastai20180716auto-ml2auto-ml

bull httpscompetitionscodalaborgcompetitions17767

bull httpswwwautomlorgautomlauto-sklearn

bull httpswwwml4aadorgautomated-algorithm-designalgorithm-configurationsmac

bull httpsautomlgithubioHpBandSterbuildhtmloptimizersbohbhtml

Reference

copy 2019 Infosys Limited Bengaluru India All Rights Reserved Infosys believes the information in this document is accurate as of its publication date such information is subject to change without notice Infosys acknowledges the proprietary rights of other companies to the trademarks product names and such other intellectual property rights mentioned in this document Except as expressly permitted neither this documentation nor any part of it may be reproduced stored in a retrieval system or transmitted in any form or by any means electronic mechanical printing photocopying recording or otherwise without the prior permission of Infosys Limited and or any named intellectual property rights holders under this document

For more information contact askusinfosyscom

Infosyscom | NYSE INFY Stay Connected

9 Neural Architecture Search (NAS)

10 Infosys Enterprise Cognitive Platform

bull httpswwworeillycomideaswhat-is-neural-architecture-search

bull httpswwwinfosyscomservicesincubating-emerging-technologiesofferingsPagesenterprise-cognitive-platformaspx

Sudhanshu Hate is inventor and architect of Infosys Enterprise Cognitive Platform (iECP)

a microservices API based Artificial Intelligence platform He has over 21 years of experience

in creating products solutions and working with clients on industry problems His current

areas of interests are Computer Vision Speech and Unstructured Text based AI possibilities

To know more about our work on the H3 trends in AI write to icetsinfosyscom

About the author

Page 19: H3 Trends in AI Algorithms: The Infosys Way...• Sentiment Analysis Figure 1.0: ... from use cases such as language translations, sentence formulation, text summarization, topic extraction

External Document copy 2019 Infosys Limited External Document copy 2019 Infosys Limited

1 Explainable AI (XAI)

2 Fine Grained Classification

5 Transfer Learning

6 Single Shot Learning

3 Capsule Networks

4 Meta Learning

7 Deep Reinforcement Learning (RL)

8 Auto ML

bull httpschristophmgithubiointerpretable-ml-book

bull httpssimmachinescomexplainable-ai

bull httpswwwcmuedunewsstoriesarchives2018octoberexplainable-aihtml

bull httpsmediumcomQuantumBlackmaking-ai-human-again-the-importance-of-explainable-ai-xai-95d347ccbb1c

bull httpstowardsdatasciencecomexplainable-artificial-intelligence-part-2-model-interpretation-strategies-75d4afa6b739

bull httpsvisioncornelleduse3wp-contentuploads201502BMVC14pdf

bull httpswwwfastai20180723auto-ml-3

bull httpsarxivorgpdf160305106pdf

bull httpsarxivorgpdf171009829pdf

bull httpskerasioexamplescifar10_cnn_capsule

bull httpswwwyoutubecomwatchv=pPN8d0E3900

bull httpswwwyoutubecomwatchv=rTawFwUvnLE

bull httpsmediumfreecodecamporgunderstanding-capsule-networks-ais-alluring-new-architecture-bdb228173ddc

bull httpsmediumcomjrodthoughtswhats-new-in-deep-learning-research-openai-s-reptile-makes-it-easier-to-learn-how-to-learn-e0f6651a39f0

bull httpproceedingsmlrpressv48santoro16pdf

bull httpstowardsdatasciencecomwhats-new-in-deep-learning-research-understanding-meta-learning-91fef1295660

bull httpsdeepmindcomblogarticledeep-reinforcement-learning

bull httpsmediumfreecodecamporgan-introduction-to-reinforcement-learning-4339519de419

bull httpsmediumcomjonathan_huialphago-zero-a-game-changer-14ef6e45eba5

bull httpsarxivorgpdf181112560pdf

bull httpswwwml4aadorgautomated-algorithm-designalgorithm-configurationsmac

bull httpswwwfastai20180723auto-ml-3

bull httpswwwfastai20180716auto-ml2auto-ml

bull httpscompetitionscodalaborgcompetitions17767

bull httpswwwautomlorgautomlauto-sklearn

bull httpswwwml4aadorgautomated-algorithm-designalgorithm-configurationsmac

bull httpsautomlgithubioHpBandSterbuildhtmloptimizersbohbhtml

Reference

copy 2019 Infosys Limited Bengaluru India All Rights Reserved Infosys believes the information in this document is accurate as of its publication date such information is subject to change without notice Infosys acknowledges the proprietary rights of other companies to the trademarks product names and such other intellectual property rights mentioned in this document Except as expressly permitted neither this documentation nor any part of it may be reproduced stored in a retrieval system or transmitted in any form or by any means electronic mechanical printing photocopying recording or otherwise without the prior permission of Infosys Limited and or any named intellectual property rights holders under this document

For more information contact askusinfosyscom

Infosyscom | NYSE INFY Stay Connected

9 Neural Architecture Search (NAS)

10 Infosys Enterprise Cognitive Platform

bull httpswwworeillycomideaswhat-is-neural-architecture-search

bull httpswwwinfosyscomservicesincubating-emerging-technologiesofferingsPagesenterprise-cognitive-platformaspx

Sudhanshu Hate is inventor and architect of Infosys Enterprise Cognitive Platform (iECP)

a microservices API based Artificial Intelligence platform He has over 21 years of experience

in creating products solutions and working with clients on industry problems His current

areas of interests are Computer Vision Speech and Unstructured Text based AI possibilities

To know more about our work on the H3 trends in AI write to icetsinfosyscom

About the author

Page 20: H3 Trends in AI Algorithms: The Infosys Way...• Sentiment Analysis Figure 1.0: ... from use cases such as language translations, sentence formulation, text summarization, topic extraction

copy 2019 Infosys Limited Bengaluru India All Rights Reserved Infosys believes the information in this document is accurate as of its publication date such information is subject to change without notice Infosys acknowledges the proprietary rights of other companies to the trademarks product names and such other intellectual property rights mentioned in this document Except as expressly permitted neither this documentation nor any part of it may be reproduced stored in a retrieval system or transmitted in any form or by any means electronic mechanical printing photocopying recording or otherwise without the prior permission of Infosys Limited and or any named intellectual property rights holders under this document

For more information contact askusinfosyscom

Infosyscom | NYSE INFY Stay Connected

9 Neural Architecture Search (NAS)

10 Infosys Enterprise Cognitive Platform

bull httpswwworeillycomideaswhat-is-neural-architecture-search

bull httpswwwinfosyscomservicesincubating-emerging-technologiesofferingsPagesenterprise-cognitive-platformaspx

Sudhanshu Hate is inventor and architect of Infosys Enterprise Cognitive Platform (iECP)

a microservices API based Artificial Intelligence platform He has over 21 years of experience

in creating products solutions and working with clients on industry problems His current

areas of interests are Computer Vision Speech and Unstructured Text based AI possibilities

To know more about our work on the H3 trends in AI write to icetsinfosyscom

About the author