13
Become a full-stack data scientist with core skills in Machine Learning and Artificial Intelligence Certified Artificial Intelligence Specialist – TCS Ion Procert Certified The Certified Artificial Intelligence Specialist is a 200 hour program for working professionals & freshers alike looking to start or make a switch into a data science & AI career. This program is one of the most comprehensive ones available in India and covers each and every aspect of data science. It will equip you with all the technologies, conceptual knowledge and skillsets that you need to crack any data science & AI interview, transition into a career in this field and prosper in it. This is a course designed to get you a job in data science! TCS has tied up with Edvancer to be our certification partner. You will be certified by TCS Ion Procert post a final exam and also by Edvancer post completing all the required projects in the course. Program Highlights Why Should You Take This Course 200 HOURS OF INDUSTRY EXPERT LED SESSIONS AND VIDEOS BOOT-CAMP STYLE TRAINING WITH 70% PRACTICAL CONTENT TOP CLASS FACULTY FROM TOP COMPANIES 24X7 LIFETIME ACCESS TO ONLINE LEARNING CONTENT & VIDEOS CREATE A FULL- FLEDGED AI PRODUCT AS YOUR CAPSTONE PROJECT CERTIFICATE FROM TCS ION PROCERT & EDVANCER WITH JOB ASSISTANCE Create a job ready project portfolio to establish your skills & credibility and attract recruiters Most comprehensive curriculum covering everything from predictive analytics, machine learning, AI & Big Data Get a huge hike in your salary on becoming a data scientist We will work closely with you to help build your data science portfolio and start a data science career Learn data science from India's top data science training institute as ranked by industry & students

Certified Artificial Intelligence Specialist TCS Ion

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Certified Artificial Intelligence Specialist TCS Ion

Become a full-stack data scientist with core skills in

Machine Learning and Artificial Intelligence

Certified Artificial Intelligence Specialist – TCS Ion Procert Certified

The Certified Artificial Intelligence Specialist is a 200 hour program for working professionals & freshers

alike looking to start or make a switch into a data science & AI career. This program is one of the most

comprehensive ones available in India and covers each and every aspect of data science. It will equip you

with all the technologies, conceptual knowledge and skillsets that you need to crack any data science & AI

interview, transition into a career in this field and prosper in it. This is a course designed to get you a job in

data science! TCS has tied up with Edvancer to be our certification partner. You will be certified by TCS Ion

Procert post a final exam and also by Edvancer post completing all the required projects in the course.

Program Highlights

Why Should You Take This Course

200 HOURS OF INDUSTRY

EXPERT LED SESSIONS AND

VIDEOS

BOOT-CAMP STYLE TRAINING

WITH 70% PRACTICAL

CONTENT

TOP CLASS FACULTY FROM

TOP COMPANIES

24X7 LIFETIME ACCESS TO

ONLINE LEARNING CONTENT

& VIDEOS

CREATE A FULL- FLEDGED

AI PRODUCT AS YOUR

CAPSTONE PROJECT

CERTIFICATE FROM TCS ION

PROCERT & EDVANCER WITH

JOB ASSISTANCE

Create a job ready project portfolio to establish your

skills & credibility and attract recruiters

Most comprehensive curriculum covering everything

from predictive analytics, machine learning, AI & Big Data

Get a huge hike in your salary on becoming a data

scientist

We will work closely with you to help build your data

science portfolio and start a data science career

Learn data science from India's top data science

training institute as ranked by industry & students

Page 2: Certified Artificial Intelligence Specialist TCS Ion

Technologies Covered

Our Students Are Placed In

List of Industry Projects

1. Pharma: Predict the sales volume of counterfeit medicines in order to guide law enforcement

agencies in cracking down on top counterfeiters.

2. BFSI: Identify fraudulent transactions for a credit card providing company to create an early

warning system to prevent frauds in real time.

3. BPO: Create a machine learning system which will automatically pick out customer complaints most

likely to be unresolved and escalate them.

4. E-Commerce: Analyze web server logs (big data) of an ecommerce portal to understand more

about the products being browsed and sold.

5. Chatbot: Create a chatbot using deep learning & NLP to address user queries online for an online

travel portal.

6. Marketing: Predict buyer behaviour using AI (deep learning).

7. Facial recognition system: Create a facial recognition system through AI using tens of thousands of

facial images.

Plus 5 Other Projects.

Page 3: Certified Artificial Intelligence Specialist TCS Ion

How it works

There are 2 options to learn this course. You can choose the option based on your comfort:

1. Live Online Classes Option:

Attend 160 hours of live online classes on weekends + go through 40 hours of self-paced videos.

Duration: 30 weekends (Sat & Sun)

Ask your questions and doubts to the faculty in the class like a normal class. Online sessions are recorded

for you to view and revise later whenever you want or if you miss a class. Get the benefits of learning from

your home through fully interactive, online classes. SQL & Big Data in Hadoop & Spark will be delivered

through videos only.

2. Self-Paced + Faculty Support:

Learn through 200 hours of recorded class videos at your time and pace. The curriculum, content, projects,

assignments and everything else remains the same as the live online classes. There are no deadlines or

timelines for you to worry about. Get all your doubts and queries cleared by faculty through forums &

emails. Learn easily at your own time and pace from anywhere. Enrol and start learning immediately!

About TCS Ion Procert Certification

TCS has tied up with Edvancer to be the certification partner for this course. Get a certificate from TCS Ion

Procert, India’s largest IT company certification arm post the course. The certificate is based on an exam

conducted periodically by TCS Ion Procert. The examination covers both the theoretical and practical

aspects of the course. This certificate will prove to be extremely valuable on your CV and will act as an

indicator of quality training and high capabilities in data science & AI.

Fees

Live Online Classes fee: Rs. 88,250 + 18% GST. Rs. 74,990/- + 18% GST. Get a 15% discount this week!

Self-Paced + Faculty Support fee: Rs. 70,500/- + 18% GST. Rs. 59,990/- + 18% GST. Get a 15% discount

this week!

Pay the fee in 6 interest-free instalments* post a 10% down-payment. Effectively you will be paying just Rs.

13,275/- per month for 6 months post down-payment.

(*instalment offer subject to approval from our financing partner based on Aadhar card and 4 months bank statements)

Payments can be made online through credit cards, debit cards or net-banking.

About Edvancer

Edvancer is India's leading data science training institute where we provide a range of courses on data

science to all levels of learners. We have trained over 5,000 students and delivered over 10,00,000+ hours

of learning. Our alumni work with some of India's top companies in data science and even globally. Our

corporate clients include PwC, E&Y, L&T, HP, JP Morgan, Cognizant, Accenture, TCS, Microsoft etc.

Page 4: Certified Artificial Intelligence Specialist TCS Ion

Full Curriculum

Module 1: Predictive Analytics in R

What is this module about?: Predictive Analytics is the scientific process of deriving insights from raw

data to support decision making and is the core of data science. Through this module you will learn how to

use analytical techniques and the R language to solve business problems. This is a comprehensive module

which will take you from the basics of statistical techniques, R language right up to building predictive

models.

Tools to be learnt: R

Duration: 60 hours

Topic What does it mean?

Introduction to business analytics

• What is analytics & why is it so important?

• Applications of analytics

• Different kinds of analytics

• Various analytics tools

• Analytics project methodology

• Real world case study

In this section we shall provide you an overview into the world of analytics. You will learn about the various applications of analytics, how companies are using analytics to prosper and study the analytics project methodology through a real-world case study

R Training

Fundamentals of R

• Installation of R & R Studio

• Getting started with R

• Basic & advanced data types in R

• Variable operators in R

• Working with R data frames

• Reading and writing data files to R

• R functions and loops

• Special utility functions

• Merging and sorting data

• Case study on data management using R

• Practice assignment

This part is all about learning how to manage and manipulate data and datasets, the very first step of analytics. We shall teach you how to use R to work with data using a case study.

Data visualization in R

• Need for data visualization

• Components of data visualization

• Utility and limitations

• Introduction to grammar of graphics

• Using the ggplot2 package in R to create visualizations

Data visualization is extremely important to understand what the data is saying and gain insights in just one glance. Visualization of data is a strong point of the R software and you will learn the same in this module.

Data preparation and cleaning using R

• Needs & methods of data preparation

• Handling missing values

• Outlier treatment

• Transforming variables

• Derived variables

• Binning data

• Modifying data with Base R

• Data processing with dplyr package

• Using SQL in R

• Practice assignment

Real world data is rarely going to be given to you perfect on a platter. It will always be dirty with missing data points, incorrect data, variables needing to be changed or created in order to analyze etc. A typical analytics project will have 60% of its time spent on preparing data for analysis. This is a crucial process as properly cleaned data will result in more accurate and stable analysis. We shall teach you all the techniques required to be successful in this aspect.

Setting the base of business analytics

Page 5: Certified Artificial Intelligence Specialist TCS Ion

Understanding the data using univariate statistics in R

• Summarizing data, measures of central tendency

• Measures of variability, distributions

• Using R to summarize data

• Case study on univariate statistics using R

• Practice assignment

This is where you shall learn how to start understanding the story your data is narrating by summarizing the data, checking its variability and shape by visualizing it. We shall take you through various ways of doing this using R and also solve a case study

Hypothesis testing and ANOVA in R to guide decision making

• Introducing statistical inference

• Estimators and confidence intervals

• Central Limit theorem

• Parametric and non-parametric statistical tests

• Analysis of variance (ANOVA)

• Conducting statistical tests

• Practice assignment

With 95% confidence we can say that there is an 85% chance, people visiting this site twice will enroll for the course ☺. Here, you learn how to create a hypothesis, test and validate it through data within a statistical framework and present it with clear and formal numbers to support decision making.

Predictive modelling in R

1. Correlation and Linear regression

• Correlation

• Simple linear regression

• Multiple linear regression

• Model diagnostics and validation

• Case study

A statistical model is the core of predictive analytics and regression is one of the most powerful tools for making predictions by finding patterns in data. You shall learn the basic of regression modelling hands-on through real world cases

2. Logistic regression

• Moving from linear to logistic

• Model assumptions and Odds ratio

• Model assessment and gains table

• ROC curve and KS statistic

• Case Study

Logistic regression is the work-horse of the predictive analytics world. It is used to make predictions in cases where the outcomes are dual in nature i.e. an X or Y scenario where we need to predict if X will be the case or will Y, given some data. This is a must-know technique and we shall make you comfortable with it through real world problems.

3. Techniques of customer segmentation

• Need for segmentation

• Criterion of segmentation

• Types of distances

• Hierarchical clustering

• K-means clustering

• Deciding number of clusters

• Case study

Learn why and how to statistically divide a broad customer market into various segments of customers who are similar to each other so as to be able to better target and meet their needs in a cost effective manner. This is one of the most essential techniques in marketing analytics.

4. Time series forecasting techniques

• Need for forecasting

• What are time series?

• Smoothing techniques

• Time series models

• ARIMA

The ability to forecast into the future is very important for any business and it is necessary to have as accurate a forecasting as possible for financial and strategic planning. In this module learn the techniques of time series analysis without being misled by seasonal and cyclical impacts.

5. Decision trees & Random Forests

• What are decision trees

• Entropy and Gini impurity index

• Decision tree algorithms

• CART

• Random Forest

• Case Study

Decision trees are predictive models which map observations about an item to conclusions about the item's target value. Learn the technique of decision trees, one of the most popular predictive analytics techniques

6. Boosting Machines

• Concept of weak learners

• Introduction to boosting algorithms

Want to win a data science contest on Kaggle or data hackathons or be known as a top data scientist? Then learning boosting algorithms is a

Page 6: Certified Artificial Intelligence Specialist TCS Ion

• Adaptive Boosting

• Extreme Gradient Boosting (XGBoost)

• Case study

must as they provide a very powerful way of analysing data and solving hard to crack problems

7. Cross Validation & Parameter Tuning

• Model performance measure with cross validation

• Parameter tuning with grid & randomised grid search

Learn how to make your model more accurate and perform the best on real -world data

Module 2: Machine Learning in Python

What is this module about?: Through this Machine Learning module, you will learn how to process,

clean, visualize and automate decision making through data science by using Python, one of the most

popular machine learning tools. You will learn cutting edge machine learning techniques in Python.

Tools to be learnt: Python (Libraries like pandas, numpy, scipy, scikit-learn, bokeh, beautifulsoup)

Duration: 54 hours

Topic What does it mean?

Introduction to Machine Learning in Python

• What is machine learning & why is it so important?

• Applications of machine learning across industries

• Machine Learning methodology

• Machine Learning Toolbox

• Tool of choice- Python: what & why?

• Course Components

In this section we shall provide you an overview into the world of machine learning. You will learn about the various applications of machine learning, how companies from all sort of domains are solving their day to day to long term business problems. We’ll learn about required skill sets of a machine learning expert which make them capable of filling up this vital role. Once the stage is set and we understand where we are heading we discuss why Python is the tool of choice in data science.

Python Training

Introduction to Python

• Installation of Python framework and packages: Anaconda and pip

• Writing/Running python programs using Spyder, Command Prompt

• Working with Jupyter Notebooks

• Creating Python variables: Numeric, string and logical operations

• Basic Data containers: Lists, Dictionaries, Tuples & sets

• Practice assignment

Python is one of the most popular & powerful languages for data science used by most top companies like Facebook, Amazon, Google, Yahoo etc. It is free and open source. This module is all about learning how to start working with Python. We shall teach you how to use the Python language to work with data.

Iterative Operations & Functions in Python

• Writing for loops in Python

• List & Dictionary Comprehension

• While loops and conditional blocks

• List/Dictionary comprehensions with loops

• Writing your own functions in Python

• Writing your own classes and functions as class objects

• Practice assignment

This is where we move beyond simple data containers and learn about amazing possibilities and functionalities hidden in various associated operators. We get introduced to wonderful world of loops, list and dictionary comprehensions. In addition to already existing functions and classes we learn to write our own custom functions and classes. This module sets the stage for handling data and ML algorithm implementation in python.

Page 7: Certified Artificial Intelligence Specialist TCS Ion

Data Summary; Numerical and Visual in Python

• Need for data summary

• Summarising numeric data in pandas

• Summarising categorical data

• Group wise summary of mixed data

• Need for visual summary

• Introduction to ggplot & Seaborn

• Visual summary of different data combinations

• Practice Exercise

Data summary is extremely important to understand what the data is saying and gain insights in just one glance. Visualization of data is a strong point of the Python software using the latest ggplot package using much celebrated grammar of graphics. We also introduce you another powerful package seaborn in additional material section.

Data Handling in Python using NumPy & Pandas

• Introduction to NumPy arrays, functions &properties

• Introduction to pandas

• Dataframe functions and properties

• Reading and writing external data

• Manipulating Data Columns

Python is a very versatile language and in this module we expand on its capabilities related to data handling. Focusing on packages numpy and pandas we learn how to manipulate data which will be eventually useful in converting raw data suitable for machine learning algorithms.

Machine Learning in Python

Basics of Machine Learning

• Business Problems to Data Problems

• Broad Categories of Business Problems

• Supervised and Unsupervised Machine Learning Algorithm

• Drivers of ML algorithms

• Cost Functions

• Brief introduction to Gradient Descent

• Importance of Model Validation

• Methods of Model Validation

• Introduction to Cross Validation and Average Error

In this module we understand how we can transform our business problems to data problems so that we can use machine learning algorithms to solve them. We will further get into discovering what all categories of business problems and subsequently which machine learning algorithms are there. We’ll learn what is the ultimate goal of any machine learning algorithm and go through a brief description of the mother of many modern optimisation methods- Gradient Descent. We’ll wrap up this module with discussion on importance and methods of validation of our results.

Generalised Linear Models in Python

• Linear Regression

• Limitation of simple linear models and need of regularisation

• Ridge and Lasso Regression (L1 & L2 Penalties)

• Introduction to Classification with Logistic Regression

• Methods of threshold determination and performance measures for classification score models

• Case Studies

We start with implementing machine learning algorithms in this module. We also get exposed to some important concepts related to regression and classification which we will be using in the later modules as well. Also this is where we get introduced to scikit-learn, the legendary python library famous for its machine learning prowess.

Tree Models using Python

• Introduction to decision trees

• Tuning tree size with cross validation

• Introduction to bagging algorithm

• Random Forests

• Grid search and randomized grid search

• ExtraTrees (Extremely Randomised Trees)

• Partial Dependence Plots

• Case Studies

• Home exercises

In this module we will learn a very popular class of machine learning models, rule based tree structures also known as Decision Trees. We'll examine their biased nature and learn how to use bagging methodologies to arrive at a new technique known as Random Forest to analyse data. We’ll further extend the idea of randomness to decrease bias in ExtraTrees algorithm. In addition, we learn about powerful tools used with all kind of machine learning algorithms, gridSearchCV and RandomizedSearchCV.

Page 8: Certified Artificial Intelligence Specialist TCS Ion

Boosting Algorithms using Python

• Concept of weak learners

• Introduction to boosting algorithms

• Adaptive Boosting

• Extreme Gradient Boosting (XGBoost)

• Case study

• Home exercise

Want to win a data science contest on Kaggle or data hackathons or be known as a top data scientist? Then learning boosting algorithms is a must as they provide a very powerful way of analysing data and solving hard to crack problems.

Support Vector Machines (SVM) and KNN in Python

• Introduction to idea of observation based learning

• Distances and Similarities

• K Nearest Neighbours (KNN) for classification

• Introduction to SVM for classification

• Regression with KNN and SVM

• Case study

• Home exercises

We step in a powerful world of “observation based algorithms” which can capture patterns in the data which otherwise go undetected. We start this discussion with KNN which is fairly simple. After that we move to SVM which is very powerful at capturing non-linear patterns in the data.

Unsupervised learning in Python

• Need for dimensionality reduction

• Introduction to Principal Component Analysis (PCA)

• Difference between PCAs and Latent Factors

• Introduction to Factor Analysis

• Patterns in the data in absence of a target

• Segmentation with Hierarchical Clustering and K-means

• Measure of goodness of clusters

• Limitations of K-means

• Introduction to density based clustering (DBSCAN)

Many machine learning algorithms become difficult to work with when dealing with many variables in the data. In comes to rescue PCA which solves problems arising from data which has highly correlated variables. The same idea can be extended to find out hidden factors in our data with Factor Analysis which is used extensively in surveys and marketing analytics. We also learn about two very important segmentation algorithms; K-means and DBSCAN and understand their differences and strengths.

Neural Networks

• Introduction to Neural Networks

• Single layer neural network

• Multiple layer Neural network

• Back propagation Algorithm

• Moment up and decaying learning rate in context of gradient descent

• Neural Networks implementation in Python

Artificial Neural Networks are the building blocks of artificial intelligence. Learn the techniques which replicate how the human brain works and create machines which can solve problems like humans.

Text Mining in Python

• Quick Recap of string data functions

• Gathering text data using web scraping with urllib

• Processing raw web data with BeautifulSoup

• Interacting with Google search using urllib with custom user agent

• Collecting twitter data with Twitter API

• Introduction to Naive Bayes

• Feature Engineering for text Data

• Feature creation with TFIDF for text data

• Case Studies

Unstructured text data accounts for more and more interaction records as most of our daily life moves online. In this module we start our discussion by looking at ways to collect all that data. In addition to scraping simple web data; we’ll also learn to use data APIs with example of Twitter API, right from the point of creating a developer account on twitter. Further we discuss one of the very powerful algorithm when it comes to text data; Naive Bayes. Then we see how we can mine the text data.

Ensemble Methods in Machine Learning

• Making use of multiple ML models taken together

• Simple Majority vote and weighted majority vote

• Blending

• Stacking

• Case Study

Individual machine learning models extract pattern from the data in different ways , which at times results in them extracting different patterns from the data. Rather than sticking to just one algorithm and not making use of other’s results is what we move past in this module. We learn to make use of multiple ML models taken together to make our predictive modelling solutions even more powerful.

Bokeh

• Introduction to Bokeh charts and plotting

For making quick prototypes of your solutions which can be scaled later as interactive visualisation in the form of standalone or hosted

Page 9: Certified Artificial Intelligence Specialist TCS Ion

web pages, we introduce you to Bokeh, an evolving library in python which has all the tools that you’ll need to do the same.

Version Control using Git and Interactive Data Products

• Need and Importance of Version Control

• Setting up git and github accounts on local machine

• Creating and uploading GitHub Repos

• Push and pull requests with GitHub App

• Merging and forking projects

• Examples of static and interactive data products

We finish this module with a discussion on two very important aspects of a data scientist’s work. First is version control which enables you to work on large projects with multiple team members scattered across the globe. We learn about git and most widely used public platform version control that is GitHub.

Module 3: Deep Learning (Artificial Intelligence) Using Tensorflow and Keras

What is this module about?: Through this module, you will learn the various techniques used in the

world of artificial intelligence like deep learning techniques, reinforcement learning, NLP and computer

vision using latest Python libraries Tensorflow and Keras. This module will put you on the cutting edge of

technology and make you future proof.

Tools to be learnt: Tensorflow and Keras

Class Duration: 30 hours

Topic What does it mean?

Introduction to AI and Deep Learning

• What is AI?

• How will AI change the world?

• What is Deep Learning?

• Uses of Deep Learning?

• Examples of Deep Learning & AI

Get introduced to the world of Artificial Intelligence which is poised to change the entire world/ Understand what is deep learning and how it is used in AI

Getting Started with Tensorflow

• Setting up tensor flow, gpu instance on gcp

• Understanding computation graph and basics of tensorflow

• Implementing simple perceptron in tensor flow

• Implementing multi layer neural network in tensor flow

• Visualising training with tensor board

TensorFlow™ is an open source software library in Python for high performance numerical computation. Originally developed by researchers and engineers from the Google Brain team within Google’s AI organization, it comes with strong support for machine learning and deep learning.

Deep Feed Forward & Convolutional Neural Networks

• Implementing deep neural net for image classification

• Understanding convolutions, strides, padding, filters etc

• Implementing CNN with tensor flow

• Regularizing with dropout

• Learning rate decay and its effects

• Batch normalisation and its effects

A feedforward neural network is an artificial neural network wherein connections between the nodes do not form a cycle and information only flows in one direction. a convolutional neural network (CNN, or ConvNet) is a class of deep, feed-forward artificial neural networks, most commonly applied to analyzing visual imagery. Learn these techniques for classifying images.

Introduction to Keras

• Basics of Keras

• Composing various models in Keras

• Parameter tuning in Keras with previous examples

Keras is a high-level neural networks API, written in Python and capable of running on top of TensorFlow. It was developed with a focus on enabling fast experimentation and allows for easy and fast prototyping.

Recurrent Neural Networks, Long-Short Term Memory and Gated Recurrent Unit

• Intro to RNN architecture

• Modelling sequences

• Limitations of RNNs

A recurrent neural network (RNN) is a class of artificial neural network where connections between nodes form a directed graph along a sequence. Long Short Term Memory networks – usually just called “LSTMs” – are a special kind of

Page 10: Certified Artificial Intelligence Specialist TCS Ion

Module 4: Big Data Processing & Analysis using Hadoop & Spark (Videos Only)

What is this module about?: Through this Big Data & Hadoop module, you will learn big data analytics &

processing using Hadoop & Spark. Learn the multiple tools which are part of the Hadoop & Spark eco-

system and learn to work with data in the order of terabytes and larger.

Tools to be learnt: Hadoop eco-system and Spark

Duration: 36 hours of recorded class videos

Topic What does it mean?

Introduction to Big Data & Hadoop

• What is Big Data?

• Characteristics of big data

• Traditional data management systems and their limitations

• Business applications of big data

• What is Hadoop?

• Why is Hadoop used?

• The Hadoop eco-system

• Big data/Hadoop use cases

In this module, you will understand the meaning of big data, how traditional systems are limited in their ability to handle big data and how the Hadoop eco-system helps in solving this problem. You will learn about the various parts of the Hadoop eco-system and their roles.

Managing a Big Data Eco-system

• Big Data technology foundations

• Big data management systems

• Approach to big data analytics

• Models to support big data analytics

• Integrating big data in organizations

• Streaming data

• Big data solutions

In this module you will learn how a big data eco-system can be implemented in organizations and the benefits it can bring.

HDFS (Hadoop Distributed File System)

• HDFS Architecture

• HDFS internals and use cases

• HDFS Daemons

• Files and blocks

• Namenode memory concerns

• Secondary namenode

• HDFS access options

• Installing and configuring Hadoop

• Hadoop daemons

• Basic Hadoop commands

• Understand HDFS federation

After this module you will learn the various basic Hadoop shell commands. You will also learn about the distributed file storage system of Hadoop, HDFS, why it is used and how is it different and how files are read and written in the storage system. You will work hands-on in implementing what is taught in this module.

• Introduction to LSTM and use cases with implementation (text data)

• Introduction to GRU and implementation (text data)

RNN, capable of learning long-term dependencies. Gated recurrent units (GRUs) are a gating mechanism in recurrent neural networks. These techniques are very popular for Natural Language Processing.

Autoencoders, Generative Adverserial Networks, Hopfield networks

• Autoencoders and dimensionality reduction

• GANs and their implementation

• Hopfield networks

• Variational auto encoders

• Word2Vec

• Glove

An autoencoder is a type of artificial neural network used to learn efficient data codings for the purpose of dimensionality reduction. Generative adversarial networks (GANs) are, implemented by a system of two neural networks contesting with each other. The purpose of a Hopfield net is to store 1 or more patterns and to recall the full patterns based on partial input. These are techniques used in computer vision.

Page 11: Certified Artificial Intelligence Specialist TCS Ion

HBase concepts

• Architecture and role of HBase

• Characteristics of HBase schema design

• Implement basic programming for HBase

• Combine best capabilities of HDFS and HBase

HBase is a distributed, versioned, column-oriented, multidimensional storage system, designed for high performance and high availability. Learn all about HBase in this module.

Introduction to MapReduce

• MapReduce basics

• Functional programming concepts

• List processing

• Mapping and reducing lists

• Putting them together in MapReduce

• Word Count example application

• Understanding the driver, mapper and reducer

• Closer look at MapReduce data flow

• Build iterative Mapreduce applications

• Understand combiners & partitioners

• Hands-on exercises

In this module, you will understand the MapReduce framework, how it works on HDFS and learn the basics of MapReduce programming and data flow (Basic Java knowledge will be required in the MapReduce modules for which videos will be provided)

Analyzing data with Pig

• Pig architecture, program structure and execution process

• Introduction to Pig Latin

• Joins & filtering using Pig

• Group & co-group

• Schema merging and redefining functions

• Pig functions

• Hands-on examples

Pig is a platform to analyse large data sets through a high level language. In this module you will focus on learning both to query and analyse large amounts of data stored in distributed storage systems.

Using Hive for Data Warehousing

• Introduction to Hive architecture

• Using Hive command line interface

• Create & execute Hive queries

• Data types, operators & functions in Hive

• Basic DDL operations

• Data manipulation using Hive

• Advanced querying with Hive

• Different join operations in Hive

• Performance tuning & query optimization in Hive

• Security in Hive

Hive is a data warehouse software for managing and querying large scale datasets. It uses a SQL like language, HiveQL to query the data. Learn Hive in-depth in this module.

Transferring bulk data using Sqoop

• Basics of Sqoop & Sqoop architecture

• Import data into Hive using Sqoop

• Export data from HDFS using Sqoop

• Drivers and connectors in Sqoop

• Importing and exporting data in Sqoop

Sqoop is a tool designed to transfer data between

Hadoop and relational databases. Learn how to use Sqoop in this module.

Streaming big data into Hadoop using Flume

• Flume architecture

• Use Flume configuration file

• Configure & build Flume for data aggregation

• Hands-on exercise

In this module learn to work with Flume which is a service for efficiently collecting, aggregating, and moving large amounts of streaming data into HDFS.

Scala Basics

• Scala environment setup

• Scala REPL

• Scala classes and Objects

• Scala variables

• Scala functions, anonymous functions and methods

• Scala closures, collections & traits

Scala is a modern multi-paradigm programming language designed to express common programming patterns in a concise, elegant, and type-safe way. Apache Spark is developed in Scala. We will learn functional programming language Scala in this module.

Page 12: Certified Artificial Intelligence Specialist TCS Ion

Spark Core

• Apache Spark and Spark Core Programming

• Difference between Spark & Hadoop frameworks

• Key components of Spark eco-system

• Initialize a Spark application

• Run a Spark job on YARN

• Create an RDD from a file or directory in HDFS

• Persist an RDD in memory or on disk

• Perform Spark transformations on an RDD

• Perform Spark actions on an RDD

• Create and use broadcast variables and accumulators

• Configure Spark properties

Apache Spark is a new cluster computing platform, designed for fast and general purpose Big Data processing. Spark is faster than Mapreduce. Spark programs can be written in Java, Scala, or Python. Because Spark is written in JVM language Scala, therefore Scala is the primary choice of language. Learn the highly in-demand technologies of Spark and Scala in this module.

Module 5: Data Analysis in SQL (Videos Only)

What is this module about?: This Data Analyst using SQL video tutorial teaches you how to use the ever

popular SQL language to analyse data stored in databases. SQL is a requirement in almost all analytics roles

and this module will make you eligible to work as a data analyst. In this SQL tutorial you will learn how to

communicate with databases, extract data from them, manipulate and analyse it & create reports.

Tools to be learnt: MS SQL

Class Duration: 6 hours of pre-recorded videos

Topic What does it mean?

Introduction To SQL

• What is SQL?

• Why SQL?

• What are relational databases?

• SQL command group

• MS SQL Server installation

• Exercises

Structured Query Language (SQL) is a standard language for storing, manipulating and retrieving data in databases. It is a heavily used language and a must know for every data scientist. Here we will introduce you to SQL using MS SQL.

SQL Data Types & Operators

• SQL Data Types

• Filtering Data

• Arithmetic Operators

• Comparison operators

• Logical Operators

• Exercises

Learn about various types of data and how to filter and conduct basic operations on data in databases using SQL.

Useful Operations in SQL

• Distinct Operation

• Top N Operation

• Sorting results

• Combine results using Union

• Null comparison

• Alias

Learn more advanced operations on data.

Aggregating Data in SQL

• Aggregate functions

• Group By clause

• Having clause

• Over clause

• Exercises

Aggregate data using various conditions and clauses in SQL to gain the answers you are looking for.

Page 13: Certified Artificial Intelligence Specialist TCS Ion

Writing Sub-Queries in SQL

• What are sub-queries?

• Sub-query rules

• Writing sub-queries

• Exercises

A subquery is a SQL query within a query. Subqueries are nested queries that provide data to the enclosing query. In this module you will learn how to write various sub-queries.

Common function in SQL

• Ranking functions

• Date & time functions

• Logical functions

• String functions

• Conversion functions

• Mathematical functions

• Exercises

Learn some of the common functions available in SQL to transform the data into more meaningful data.

Analytic Functions in SQL

• What are analytic functions?

• Various analytic functions

• SQL syntax for analytic functions

• Exercises

Here you will learn various analytics function in SQL to undertake data analysis in SQL.

Writing DML Statements

• What are DML Statements?

• Insert statement

• Update statement

• Delete statement

• Exercises

DML is abbreviation of Data Manipulation Language in SQL. It is used to retrieve, store, modify, delete, insert and update data in databases.

Writing DDL Statements

• What are DDL Statements?

• Create statement

• Alter statement

• Drop statement

• Exercises

DDL refers to "Data Definition Language", a subset of SQL statements that change the structure of the database schema in some way, typically by creating, deleting, or modifying schema objects such as databases, tables, and views.

Using Constraints in SQL

• What are constraints?

• Not Null Constraint

• Unique constraint

• Primary key constraint

• Foreign key constraint

• Check constraint

• Default Constraint

• Exercises

Constraints provide a standard mechanism to maintain the accuracy and integrity of the data inside table. There are several different types of constraints in SQL which you will learn here.

SQL Joins

• What are joins?

• Cartesian Join

• Inner Join

• Left & Right Join

• Full Join

• Self Join

A SQL Join statement is used to combine data or rows from two or more tables. Learn the various joins in SQL in this module.

Views in SQL

• What are views?

• Create View

• Drop view

• Update view

A view is a virtual table that consists of columns from one or more tables. Though it is similar to a table, it is not stored in the database. It is a query stored as an object. Hence, a view is an object that derives its data from one or more tables. Learn how to create these views in this module.