30
Certification Program in Big Data Analytics 1 | Page Certification in Big Data Analytics In collaboration with IBM

Big Data Analytics - Intellipaat...2020/10/08  · Big Data Analytics is the most in-demand and highly paid job profile in the market. In this course, you will master skills such as

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Big Data Analytics - Intellipaat...2020/10/08  · Big Data Analytics is the most in-demand and highly paid job profile in the market. In this course, you will master skills such as

Certification Program in Big Data Analytics 1 | P a g e

Certification in

Big Data Analytics In collaboration with IBM

Page 2: Big Data Analytics - Intellipaat...2020/10/08  · Big Data Analytics is the most in-demand and highly paid job profile in the market. In this course, you will master skills such as

Certification Program in Big Data Analytics 2 | P a g e

Table of Contents

1. About the Program

2. Partnering with E&ICT, IIT Guwahati

3. Collaborating with IBM

4. About Intellipaat

5. Key Features

6. Career Support Services

7. Eligibility Criteria & Application Process

8. Learning Path

9. Course Advisors

10. Program Curriculum

11. Certification

12. Success Stories

13. Contact Us

Page 3: Big Data Analytics - Intellipaat...2020/10/08  · Big Data Analytics is the most in-demand and highly paid job profile in the market. In this course, you will master skills such as

Certification Program in Big Data Analytics 3 | P a g e

About the Program

Big Data Analytics is the most in-demand and highly paid job profile in the market. In this

course, you will master skills such as statistics, data manipulation, data analysis, data

visualization, handling static and real-time large volumes of data, Big Data tools, etc. This

certification program in Big Data Analytics by E&ICT, IIT Guwahati, and Intellipaat is

created with an objective to provide high-end skills to working professionals and young

engineers, as well as to help them grow in their career.

This course will provide academic rigor and research elements, along with real-time

industry exposure, through case studies and project work. In this program, you will be

mentored by top industry experts, and sessions will be delivered by professors from top

universities and professionals from the industry.

This certification program is a blend of self-paced online videos, live virtual classes, hands-

on projects, and lab sessions. As part of this program, you will undergo 11 courses,

namely, Statistics for Analytics, Data Analytics Using Excel, Data Analytics Using SQL,

Python for Analytics, Basic Java & Linux, Big Data & Hadoop, Apache Spark, MongoDB,

Administrating Big Data Systems, Business Intelligence & Data Mining, and Tableau.

Page 4: Big Data Analytics - Intellipaat...2020/10/08  · Big Data Analytics is the most in-demand and highly paid job profile in the market. In this course, you will master skills such as

Certification Program in Big Data Analytics 4 | P a g e

Partnering with E&ICT, IIT Guwahati

This certification program in Big Data Analytics is in partnership with E&ICT Academy, IIT

Guwahati. E&ICT Academy, IIT Guwahati is an initiative of MeitY (Ministry of Electronics and

Information Technology, Govt. of India) and is formed with a team of IIT Guwahati professors

to provide high-quality education programs to working professionals.

Upon the completion of this program, you will:

Receive a joint certificate from E&ICT, IIT Guwahati, and Intellipaat

Have the alumni status of E&ICT, IIT Guwahati

Work on 30+ case studies

Complete a certification program from a top university

Have this program in collaboration with IBM

Gain industry-recognized certification from IBM

Have sessions from top professors and industry experts

Page 5: Big Data Analytics - Intellipaat...2020/10/08  · Big Data Analytics is the most in-demand and highly paid job profile in the market. In this course, you will master skills such as

Certification Program in Big Data Analytics 5 | P a g e

Collaborating with IBM IBM is one of the leading innovators and the biggest player in creating innovative Big Data

Analytics tools. Top subject matter experts from IBM will share knowledge in the domains

of Data Analytics and Big Data through this training program that will help you gain the

breadth of knowledge and industry experience.

Benefits for students from IBM:

Industry-recognized IBM certification

Access to IBM Watson for hands-on training and practice

Industry in-line case studies and project work

About Intellipaat

Intellipaat is one of the leading e-learning training providers with more than 600,000

learners across 53+ countries. We are on a mission to democratize education as we

believe that everyone has the right to quality education.

Our courses are delivered by subject matter experts from top MNCs, and our world-class

pedagogy enables quick learning of difficult topics in no time. Our 24/7 technical support

and career services will help learners jump-start their careers in their dream companies.

Page 6: Big Data Analytics - Intellipaat...2020/10/08  · Big Data Analytics is the most in-demand and highly paid job profile in the market. In this course, you will master skills such as

Certification Program in Big Data Analytics 6 | P a g e

Key Features

400+ HRS INSTRUCTOR

LED TRAINING 24/7 SUPPORT

20+ REAL TIME

CASE STUDIES & PROJECTS

CERTIFICATION PROGRAM

FROM A TOP UNIVERSITY

CERTIFICATION FROM IBM SESSIONS FROM TOP

PROFESSORS AND INDUSTRY

EXPERTS

CAREER SERVICES

EXECUTIVE ALUMNI STATUS

Page 7: Big Data Analytics - Intellipaat...2020/10/08  · Big Data Analytics is the most in-demand and highly paid job profile in the market. In this course, you will master skills such as

Certification Program in Big Data Analytics 7 | P a g e

Career Support

DEDICATED LEARNING MANAGER

Get mentored by experts, receive personalized feedback on your

performance, and clarify your doubts in no time

PERSONALIZED INDUSTRY MENTOR

We match your profile with the right industry mentor based on your

past skills. Your mentor’s guidance will help you get prepare yourself

MOCK INTERVIEWS

Mock interviews to make you prepare for cracking interviews by top

employers

GUARANTEED INTERVIEWS & JOB SUPPORT

Get interviewed by our 400+ hiring partners and enhance

your chances of getting placed

RESUME PREPARATION

Get assistance in creating a world-class resume from our

career services team

Page 8: Big Data Analytics - Intellipaat...2020/10/08  · Big Data Analytics is the most in-demand and highly paid job profile in the market. In this course, you will master skills such as

Certification Program in Big Data Analytics 8 | P a g e

Eligibility Criteria & the Application Process

Those wishing to enroll in this certification program in Big Data Analytics will be required to

follow the admission process mentioned below.

Eligibility Criteria

For the admission to the certification program in Big Data Analytics, candidates should:

Have a bachelor’s degree with an average of 50% or higher marks and a basic

understanding of programming concepts

Be working professionals with zeal to build a career in Big Data Analytics

Application Process

The application process consists of three simple steps. Candidates have to submit their

application. An offer of admission will be made to the selected candidates, and their

application will be accepted upon the payment of the admission fee.

SUBMIT APPLICATION

Tell us a bit about yourself and why you want to join this program

ADMISSION TEST & APPLICATION REVIEW

Clear the admission test and have a personal interview with our

interview panel

ADMISSION LETTER

Shortlisted candidates would be offered the admission letter

Page 9: Big Data Analytics - Intellipaat...2020/10/08  · Big Data Analytics is the most in-demand and highly paid job profile in the market. In this course, you will master skills such as

Certification Program in Big Data Analytics 9 | P a g e

Courses offered

1. MongoDB NoSQL and MS SQL Database

2. Statistics for Analytics

3. Data Processing Tools

4. Data Analytics Using Python

5. Hadoop and Its Ecosystems

6. Apache Spark and Scala

7. Python for Spark

8. Python for Spark: Functional and Object-Oriented Model

9. Tuning Models

10. Advanced Models / Ensemble Learning

11. Model Deployment

12. Apache Spark Framework and RDDs

13. PySpark SQL and Data Frames

14. PySpark Streaming

15. Introduction to PySpark Machine Learning

16. Streaming and Real-time Messaging Systems in Big Data

17. Administrating Big Data Systems

18. Business Intelligence and Data Mining

19. Data Visualization Using Tableau

1. Big Data Programming Prerequisites

INSTRUCTOR LED TRAINING COURSES

SELF PACED LEARNING

Page 10: Big Data Analytics - Intellipaat...2020/10/08  · Big Data Analytics is the most in-demand and highly paid job profile in the market. In this course, you will master skills such as

Certification Program in Big Data Analytics 10 | P a g e

Course Advisor

Muthusamy Manigandan

Head Engineering, Amazon India Mani comes with great experience in Algorithms, Data Science, Big Data, and

AI. He has worked on multiple research projects in the past on Data Science,

AI & ML for Display Advertising, and Recommendation and Classification

systems. He comes with more than 16 years of experience in building large-

scale AI products with top MNCs.

Diwakar Chittora

Co-founder & CEO, Intellipaat

He has more than 11 years of experience in developing large-scale BI

products for Fortune 500 companies. He also has great experience in doing

Data Analytics on large-scale data. In the past, he has worked in companies

such as Amex, Mercedes Benz Research, Pentaho, and Wipro.

David Callaghan

Big Data Strategist and Solutions Architect, Perficinet, USA

An experienced Blockchain professional, who has been bringing integrated

Blockchain, particularly Hyperledger and Ethereum, and Big Data solutions to

the cloud, David Callaghan has previously worked on Hadoop, AWS Cloud,

Big Data, and Pentaho projects that have had a major impact on the revenues

of marquee brands around the world.

Suresh Paritala

Solutions Architect at Microsoft, Texas

A Senior Software Architect at Microsoft, who has previously worked with

IBM Corporation, Suresh Paritala has worked on Big Data, Data Science,

Advanced Analytics, Internet of Things, and Azure, along with AI domains like

Machine Learning and Deep Learning. He has successfully implemented

high-impact projects in major corporations around the world.

Page 11: Big Data Analytics - Intellipaat...2020/10/08  · Big Data Analytics is the most in-demand and highly paid job profile in the market. In this course, you will master skills such as

Certification Program in Big Data Analytics 11 | P a g e

Pre-Requisite Skills

LINUX ADMINISTRATION Introduction to Linux

Introduction to Linux, Basics of Shell, Basics of Kernel, CentOS 8 installation and VBox additions,

Basic Linux Commands, ECHO and EXPR command, Set and unset a variable, Header of a shell

script (#!)

File Management

Text editors and file creation; Users, Groups and Processes; Root and Linux file hierarchy,

Understanding file hierarchy, Understanding file permissions, chmod and chown commands, the

LS command, Metacharacters, Editing a file using VIM, Displaying contents of a file, Copy, Move

and Remove files

Files and Processes

Everything is a file in UNIX/Linux (files, directories, executables, processes), Process control

commands (ps and kill), other process control tools (top, nice, renice)

Introduction to Shell Scripting

What is shell scripting, Types of shell, Creating and writing a shell script, Changing the permission

of the shell script, Executing the script, Environment variables, Defining a local and a global

variable, User input in a shell script

Conditional, Looping statements and Functions

What are Conditional statements, Using IF, IF-ELSE, Nested IF statements, What are Looping

statements, Using WHILE, UNTIL and FOR statements, Using the case…esac statement, What is

a Function, Creating a function in Linux, Calling functions

Text Processing

Using GREP command, Using SED command, Using AWK command, Mounting a file to the virtual

box, Creating a shared folder (mounting a folder), Using SORT command and Using pipes to

combine multiple Commands

Page 12: Big Data Analytics - Intellipaat...2020/10/08  · Big Data Analytics is the most in-demand and highly paid job profile in the market. In this course, you will master skills such as

Certification Program in Big Data Analytics 12 | P a g e

Scheduling Tasks

What are Daemons, Introduction to Task scheduling in Linux, Scheduling a job in Linux, What is

Cron and Crontab, How to use cron, Using the AT command

Linux Networking

What is networking in Linux, Using networking commands – IFCONFIG, PING, Wget and cURL,

SSH, SCP and FTP, Learning Firewall tools – iptables and firewalld, DNS and Resolving IP

address, nslookup and dig

Program Curriculum

STATISTICS FOR ANALYTICS

Introduction to Statistics

Why do we need Statistics?, Categories of Statistics, Statistical Terminology, Types of

Data, Measures of Central Tendency, Measures of Spread, Correlation, and Covariance,

Standardization and Normalization, Probability and the Types of Probability, Hypothesis

Testing, Chi-square Testing, ANOVA, Normal Distribution, and Binary Distribution

Logistic Regression

Introduction to Logistic Regression, Logistic Regression Concepts, Linear Regression vs

Logistic Regression, Math Behind Logistic Regression, Detailed Formulae, Logit Function

and Odds, Bivariate Logistic Regression, Poisson Regression, Building a Simple ‘Binomial’

Model and Predicting the Result, Confusion Matrix and Accuracy, True Positive Rate,

False Positive Rate, and the Confusion Matrix for Evaluating the Built Model, Threshold

Evaluation with ROCR, Finding the Right Threshold by Building the ROC Plot, Cross

Validation and Multivariate Logistic Regression, Building Logistic Models with Multiple

Independent Variables, and the Real-life Applications of Logistic Regression

Decision Trees and Random Forest

What is Classification?, Different Classification Techniques, Introduction to Decision Trees,

Algorithms for Decision Tree Induction, Building a Decision Tree in R, Creating a Perfect

Decision Tree, Confusion Matrix, Regression Trees vs Classification Trees, Introduction to

Page 13: Big Data Analytics - Intellipaat...2020/10/08  · Big Data Analytics is the most in-demand and highly paid job profile in the market. In this course, you will master skills such as

Certification Program in Big Data Analytics 13 | P a g e

the Ensemble of Trees and Bagging, Random Forest Concept, Implementing Random

Forest in R, What is Naive Bayes?, Computing Probabilities, Understanding the Concept of

Information Gain for the Right Split of Node, Impurity Function – Information Gain,

Understanding the Concept of Gini Index for the Right Split of Node, Impurity Function –

Gini Index, Understanding the Concept of Entropy for the Right Split of Node, Impurity

Function – Entropy, Overfitting and Pruning, Pre-pruning, Post-pruning, Cost-complexity

Pruning, Pruning a Decision Tree and Predicting Values, Finding the Right Number of

Trees, and Evaluating Performance Metrics

DATA PROCESSING TOOLS

Data Analysis Using Excel

Understanding the Concepts of Finance, Concepts of Economics, Hands-on: Inferential

Statistics, Descriptive Statistics, Simple and Multivariate Regression, and Confidence

Intervals

Data Analysis Using SQL

Master Concepts of MySQL, Working with MySQL and MySQL IDE – Installation and

Setup, Introduction to SQL Queries – DDL Queries (create and select) and DML Queries

(alter, insert, etc.), Working with Joins, Groups, Filtering Data, Writing Complex SQL

Queries for Data Retrieval, and the Import and Export of Data and Database Tables

Python for Analytics

Python Basics

Understanding Python Language, Basic Constructs, Advantages over Other

Languages, etc.

OOPs in Python

Understanding the OOP Paradigm, Access Modifiers, Instances, Class Members,

Classes and Objects, Function Parameter and Return Type Functions, and Lambda

Expressions

Page 14: Big Data Analytics - Intellipaat...2020/10/08  · Big Data Analytics is the most in-demand and highly paid job profile in the market. In this course, you will master skills such as

Certification Program in Big Data Analytics 14 | P a g e

NumPy for Mathematical Computing

Introduction to Mathematical Computing in Python, What are Arrays and Matrices?

Array Indexing, Array Math, Inspecting a NumPy Array, and NumPy Array

Manipulation

SciPy for Scientific Computing

SciPy Concepts, Characteristics of SciPy, Various Sub-packages such as Signal,

Integrate, Fftpack, Cluster, Optimize, Stats, and more, and Bayes Theorem with

SciPy

Data Manipulation

Understanding Data Manipulation in Python, Working with Pandas Library,

DataFrames, Merging Data Objects, Joins, and Cleaning and Visualizing Datasets

Data Visualization with Matplotlib

Understanding Visualization in Python, Plotting Graphs and Charts such as Scatter,

Bar, Pie, Line, Histogram, and more, Matplotlib API, Subplots, and Pandas’ built-in

Data Visualization

Supervised Learning

Understand Machine Learning and Its Types, Supervised Learning and Linear

Regression, Working with Classification Model and Writing Code, Decision Tree,

Confusion Matrix, Random Forest, Naïve Bayes Classifier, Support Vector Machine,

and XGBoost

Unsupervised Learning

Introduction to Unsupervised Learning, Different Types of Clustering - Exclusive,

Overlapping, and Hierarchical, Working with K-means Algorithm and Using Scikit-

Learn, Association Rule Mining, Understanding Market Basket Analysis and Apriori

Algorithm, and Measures in Association Rules

Python Integration with Spark

Introduction to PySpark, PySpark Installation, PySpark Use Cases, and Building

Applications Using PySpark

Page 15: Big Data Analytics - Intellipaat...2020/10/08  · Big Data Analytics is the most in-demand and highly paid job profile in the market. In this course, you will master skills such as

Certification Program in Big Data Analytics 15 | P a g e

Python for Spark: Functional and Object-Oriented Model

• Functions

• Lambda Functions

• Global Variables, its Scope, and Returning Values

• Standard Libraries

• Object-Oriented Concepts

• Modules Used in Python

• The Import Statements

• Module Search Path

• Package Installation Ways

Hands-On:

• Lambda – Features, Options, Syntax, Compared with the Functions

• Functions – Syntax, Return Values, Arguments, and Keyword Arguments

• Errors and Exceptions – Issue Types, Remediation

• Packages and Modules – Import Options, Modules, sys Path

• Apache Spark Framework and RDDs Preview

• Spark Components & its Architecture

• Spark Deployment Modes

• Spark Web UI

• Introduction to PySpark Shell

• Submitting PySpark Job

• Writing your first PySpark Job Using Jupyter Notebook

• What is Spark RDDs?

• Stopgaps in existing computing methodologies

• How RDD solve the problem?

• What are the ways to create RDD in PySpark?

• RDD persistence and caching

• General operations: Transformation, Actions, and Functions

• Concept of Key-Value pair in RDDs

• Other pair, two pair RDDs

• RDD Lineage

• RDD Persistence

Page 16: Big Data Analytics - Intellipaat...2020/10/08  · Big Data Analytics is the most in-demand and highly paid job profile in the market. In this course, you will master skills such as

Certification Program in Big Data Analytics 16 | P a g e

• WordCount Program Using RDD Concepts

• RDD Partitioning & How it Helps Achieve Parallelization

• Passing Functions to Spark

Hands-On:

• Building and Running Spark Application

• Spark Application Web UI

• Loading data in RDDs

• Saving data through RDDs

• RDD Transformations

• RDD Actions and Functions

• RDD Partitions

• WordCount program using RDD’s in Python

PySpark SQL and Data Frames

• Need for Spark SQL

• What is Spark SQL

• Spark SQL Architecture

• SQL Context in Spark SQL

• User-Defined Functions

• Data Frames

• Interoperating with RDDs

• Loading Data through Different Sources

• Performance Tuning

• Spark-Hive Integration

Hands-On:

• Spark SQL – Creating data frames

• Loading and transforming data through different sources

• Spark-Hive Integration

Page 17: Big Data Analytics - Intellipaat...2020/10/08  · Big Data Analytics is the most in-demand and highly paid job profile in the market. In this course, you will master skills such as

Certification Program in Big Data Analytics 17 | P a g e

Apache Kafka and Flume

• Why Kafka

• What is Kafka?

• Kafka Workflow

• Kafka Architecture

• Kafka Cluster Configuring

• Kafka Monitoring tools

• Basic operations

• What is Apache Flume?

• Integrating Apache Flume and Apache Kafka

Hands-On:

• Single Broker Kafka Cluster

• Multi-Broker Kafka Cluster

• Topic Operations

• Integrating Apache Flume and Apache Kafka

PySpark Streaming

• Introduction to Spark Streaming

• Features of Spark Streaming

• Spark Streaming Workflow

• StreamingContext Initializing

• Discretized Streams (DStreams)

• Input DStreams, Receivers

• Transformations on DStreams

• DStreams Output Operations

• Describe Windowed Operators and Why it is Useful

• Stateful Operators

• Vital Windowed Operators

• Twitter Sentiment Analysis

• Streaming using Netcat server

• WordCount program using Kafka-Spark Streaming

Page 18: Big Data Analytics - Intellipaat...2020/10/08  · Big Data Analytics is the most in-demand and highly paid job profile in the market. In this course, you will master skills such as

Certification Program in Big Data Analytics 18 | P a g e

Hands-On:

• Twitter Sentiment Analysis

• Streaming using Netcat server

• WordCount program using Kafka-Spark Streaming

• Spark-flume Integration

Introduction to PySpark Machine LearningPreview

• Introduction to Machine Learning- What, Why and Where?

• Use Case

• Types of Machine Learning Techniques

• Why use Machine Learning for Spark?

• Applications of Machine Learning (general)

• Applications of Machine Learning with Spark

• Introduction to MLlib

• Features of MLlib and MLlib Tools

• Various ML algorithms supported by MLlib

• Supervised Learning Algorithms

• Unsupervised Learning Algorithms

• ML workflow utilities

Hands-On:

• K- Means Clustering

• Linear Regression

• Logistic Regression

• Decision Tree

• Random Forest

Tuning Models

• Why model tuning?

• What is model tuning?

• What are parameters

• What are Hyper-parameters

• What is Hyper-parameter tuning?

• Types of Hyper parameter tuning:

Page 19: Big Data Analytics - Intellipaat...2020/10/08  · Big Data Analytics is the most in-demand and highly paid job profile in the market. In this course, you will master skills such as

Certification Program in Big Data Analytics 19 | P a g e

• Grid Search

• Random Search

Hands-On:

• Performing Grid Search Hyperparameter Tuning to Increase model accuracy

• Performing Random Search Hyperparameter Tuning to Increase model accuracy

Advanced Models / Ensemble Learning

• Why Ensemble Learning?

• What is Ensemble Learning?

• Model Error

• Bias

• Variance

• Reducing Model Error

• Different Types of Ensemble Learning

• Bagging

• Boosting

• Stacking

Hands-On:

• Creating a Bagging classifier to reduce model error using sklearn

• Creating a Boosting classifier to reduce model error using sklearn

• Creating a Stacking classifier to reduce model error using sklearn

Model Deployment

• What is Model Deployment

• Model Deployment Strategy

• Steps in Model Deployment

• Create a model

• Save it

• Load in in a web server/ web api

• Make Predictions

Hands-On:

• Saving and Deploying a model using a python falsk web api

Page 20: Big Data Analytics - Intellipaat...2020/10/08  · Big Data Analytics is the most in-demand and highly paid job profile in the market. In this course, you will master skills such as

Certification Program in Big Data Analytics 20 | P a g e

MASTERING BIG DATA ANALYTICS TOOLS

PROGRAMMING REFERSHER COURSE

Java Programming for MapReduce

Concepts of Object-oriented Programming, Understanding and Writing Code Using

Classes, Objects, Functions, Operators, etc., and Core Java Programming Required

for Writing MapReduce Code

Linux Fundamentals

Concepts of Object-oriented Programming, Understanding and Writing Code Using

Classes, Objects, Functions, Operators, etc., and Core Linux Programming

Required for Writing MapReduce Code

BIG DATA HADOOP AND ITS ECOSYSTEMS

Hadoop Installation and Setup

Setting up Hadoop Clustering on a Virtual Machine and the Different Components

Involved

Introduction to Big Data Hadoop, HDFS, and MapReduce

Concepts of Hadoop and Its Ecosystems and Understanding MapReduce, HDFS,

and YARN in Depth

Deep Dive into MapReduce

Working Mechanism of MapReduce, Various Stages in MR, Terminology in MR,

Working with Partitioners, Combiners, and Shuffle and Sort

Introduction to Hive

Introducing Hadoop Hive, Detailed Architecture of Hive, Comparing Hive with Pig

and RDBMS, Working with Hive Query Language, Creation of Databases, Table,

Group by, and Other Clauses, Various Types of Hive Tables, HCatalog, Storing Hive

Results, Hive partitioning, and Buckets

Page 21: Big Data Analytics - Intellipaat...2020/10/08  · Big Data Analytics is the most in-demand and highly paid job profile in the market. In this course, you will master skills such as

Certification Program in Big Data Analytics 21 | P a g e

Introduction to Pig

Introducing Pig, Detailed Architecture of Pig, Comparing Hive with Pig, Working

with Pig, Creation of Databases, and Table, Group by, and Other Clauses

Flume & Sqoop

Concepts of Apache Sqoop, Importing and Exporting Data, Performance

Improvement with Sqoop, and Understanding the Concepts of Flume and Its

Architecture

APACHE SPARK AND SCALA

Mastering Scala Programming

What is Scala? Its Use Cases, Advantages over Other Languages, Writing Your

First Scala Code, Working with Classes, Objects, and Functions, Bobsrockets

Package and Comparing Between Mutable and Immutable Collections, Scala REPL,

Lazy Values, Control Structures in Scala, and Directed Acyclic Graph (DAG)

Understanding Spark Framework

What is Apache Spark? Its Various Features, Comparing with Hadoop, Combining

HDFS with Spark, and Spark Architecture

RDDs in Spark

Understanding Spark RDD Operations, Comparison of Spark with MapReduce,

Spark Transformation, Loading Data into Spark, Types of RDD Operations viz.

Transformation and Action, and What is a Key/Value Pair?

DataFrames and Spark SQL

Detailed Understanding of Spark SQL, the Significance of SQL in Spark for Working

with Structured Data Processing, Spark SQL JSON Support, Working with XML Data

and Parquet Files, Creating Hive Context, Writing a DataFrame to Hive, How to

Read a JDBC File?, Significance of a Spark DataFrame, How to Create a

DataFrame? What is Schema Manual Inferring? How to Work with CSV files? JDBC

Table Reading, and Data Conversion from DataFrame to JDBC

Page 22: Big Data Analytics - Intellipaat...2020/10/08  · Big Data Analytics is the most in-demand and highly paid job profile in the market. In this course, you will master skills such as

Certification Program in Big Data Analytics 22 | P a g e

Spark SQL

Spark SQL User-defined Functions, Shared Variables and Accumulators, How to

Query and Transform Data in DataFrames? How Does a DataFrame Provide the

Benefits of Both Spark RDD and Spark SQL? and Deploying Hive on Spark as the

Execution Engine

Machine Learning Using Spark (MLlib)

Introduction to Spark MLlib, Understanding Various Algorithms, Iterative Algorithm in

Spark, Spark Graph Processing Analysis, Introducing Machine Learning, K-means

Clustering, Spark Variables such as Shared and Broadcast Variables, What are

Accumulators? Various ML Algorithms Supported by MLlib, Linear Regression,

Logistic Regression, Decision Tree, Random Forest, and Building a

Recommendation Engine Using Spark

STREAMING AND REAL-TIME MESSAGING IN

BIG DATA

Apache Flume and Apache Kafka

Why Kafka? What is Kafka? Kafka Architecture, Kafka Workflow, Configuring Kafka

Clusters, Basic Operations, Kafka Monitoring Tools, and Integrating Apache Flume

and Apache Kafka

Spark Streaming

Introduction to Spark Streaming, Its Architecture and Features, Writing a Spark

Streaming Program, Processing Data Using Spark Streaming, Requesting Count

and DStream, Multi-batch and Sliding-window Operations, Working with Advanced

Data Sources, Spark Streaming Workflow, Initializing the Streaming Context,

Discretized Streams (DStreams), Input DStreams and Receivers, Transformations

on DStreams, Output Operations on DStreams, Windowed Operators and Why It Is

Useful, and Important Windowed Operators and Stateful Operators

Page 23: Big Data Analytics - Intellipaat...2020/10/08  · Big Data Analytics is the most in-demand and highly paid job profile in the market. In this course, you will master skills such as

Certification Program in Big Data Analytics 23 | P a g e

NOSQL DATABASE – MONGODB

Introduction to NoSQL Databases and Their Importance

Understanding NoSQL Database, CAP Theorem, Different Types of Distributed

Databases, Querying Without SQL, Introduction to JSON, Querying Using JSON,

and ACID properties

Introduction to NoSQL and MongoDB

What is RDBMS? Challenges of RDBMS, NoSQL Database, Its Significance, How

NoSQL Suits Big Data Needs, Introduction to MongoDB and Its Advantages, JSON

Features, Data Types, and Examples

MongoDB Installation

Installing MongoDB, Basic MongoDB Commands and Operations, MongoChef

(MongoGUI) Installation, and MongoDB Data Types

Working with MongoDB

Base Property, JSON/BSON, MongoDB Write Concern Acknowledged, Replica

Acknowledged, Unacknowledged, Journaling, and Fsync

CRUD Operations

Understanding CRUD and Its Functionality, CRUD Concepts, MongoDB Query and

Syntax, Read and Write Queries, and Query Optimization

Data Modeling and Schema Design

Concepts of Data Modeling, Difference Between MongoDB and RDBMS Modeling,

Model Tree Structure, Operational Strategies, and Monitoring and Backup

Data Management and Administration

MongoDB Administration Activities: Health Check, Backup, Recovery, Database

Sharding and Profiling, Data Import/Export, Performance Tuning, etc.

Page 24: Big Data Analytics - Intellipaat...2020/10/08  · Big Data Analytics is the most in-demand and highly paid job profile in the market. In this course, you will master skills such as

Certification Program in Big Data Analytics 24 | P a g e

Data Indexing and Aggregation

Concepts of Data Aggregation and Types, Data Indexing Concepts, Properties, and

Variations

MongoDB Security

Understanding Database Security Risks, MongoDB Security Concept and Security

Approach, and MongoDB Integration with Java and Robomongo

Working with Unstructured Data

Implementing Techniques to Work with a Variety of Unstructured Data such as

Images, Videos, Log Data and Others and Understanding GridFS MongoDB File

System for Storing Data

ADMINISTRATING BIG DATA SYSTEMS

Creation of Multi-node Cluster Setup Using Amazon EC2

Creating a 4-node Hadoop Cluster Setup on Amazon EC2, Running MapReduce

Jobs on the Hadoop Cluster, and Mastering the Cloudera Manager Setup

Cluster Configuration

Cluster Configuration Settings, Various Parameters and Values of Configuration,

HDFS Parameters and MapReduce Parameters, Include and Exclude Configuration

Files, the Administration and Maintenance of the NameNode, DataNode Directory

Structures and Files, and Working with the Edit Log.

Maintenance, Monitoring, and Troubleshooting

How to Maintain, Monitor, and Troubleshoot Hadoop Cluster? Checkpoint

Procedure, How to Handle NameNode Failure? How to Ensure the Recovery

Procedure? Safe Mode, Metadata, and Data Backup, Understanding Potential

Problems and Solutions, and What to Look for and How to Add and Remove Nodes?

Implementing Security Using Kerberos

Concepts of Security and How to Use Kerberos to Provide Robust Security to

Hadoop Cluster

Page 25: Big Data Analytics - Intellipaat...2020/10/08  · Big Data Analytics is the most in-demand and highly paid job profile in the market. In this course, you will master skills such as

Certification Program in Big Data Analytics 25 | P a g e

BUSINESS INTELLIGENCE AND DATA MINING

Data Warehousing and Data Mining

Understanding Data Warehousing, Use Cases and Applications, What is Data

Mining? How to Use Big Data Systems in Data Warehousing Environment

Creating Data Models for Large Data Warehouses

Different Types of Data Models - Star, Snowflake, and Hybrid - Which is the Right

Model for Handling Large Datasets?

Integration of Hadoop and Spark with ETL Tool

How does the Informatica ETL tool work? Building Workflows Using Informatica for

Integration with HDFS, Hive, MapReduce, etc., and Performance Tuning of ETL

Systems for Processing Large Datasets

DATA VISUALIZATION

Introduction to Data Visualization and Power of Tableau

Different Types of Data Visualization Techniques, Comparison and Their Benefits,

Reading Raw Numbers Using Tableau, Tableau Interface, and Working with Data

Sources

Architecture of Tableau

Installation of Tableau Desktop, Architecture of Tableau, Interface of Tableau

(Layout, Toolbars, Data Pane, Analytics Pane, etc.), and Share and Export in

Tableau

Working with Metadata and Data Blending

Connection to Excel, Cubes and PDFs, Management of Metadata and Extracts,

Working with Joins (Left, Right, Inner, and Outer) and Union, Dealing with NULL

Values, Cross-database Joining, Data Blending, Refresh Extraction, and Incremental

Extraction

Page 26: Big Data Analytics - Intellipaat...2020/10/08  · Big Data Analytics is the most in-demand and highly paid job profile in the market. In this course, you will master skills such as

Certification Program in Big Data Analytics 26 | P a g e

Creation of Sets

Mark, Highlight, Sort, Group, and Use Sets (Creating and Editing Sets, IN/OUT, Sets

in Hierarchies), Constant Sets, Computed Sets, Bins, etc.

Working with Filters

Filters (Addition and Removal), Filtering Continuous Dates, Dimensions, and

Measures, Interactive Filters, Marks Card, Hierarchies, How to Create Folders in

Tableau, Sorting in Tableau, Types of Sorting, Filtering in Tableau, Types of Filters,

Filtering the Order of Operations, etc.

Organizing Data and Visual Analytics

Using Formatting Pane to Work with Menu, Fonts, Alignments, Settings, Labels and

Tooltips, Axes and Annotations, K-means Cluster Analysis, Trend and Reference

Lines, Forecasting, Confidence Interval, Reference Lines, and Bands

Working with Mapping

Working on Coordinate Points, Plotting Longitude and Latitude, Editing

Unrecognized Locations, Customizing Geocoding, Polygon Maps, Web Mapping

Services (WMS), Working with Images, Map Visualization, Custom Territories, Map

Box, and WMS Map

Working with Calculations and Expressions

Working with Calculations in Tableau, How to Use Syntax and Functions, LOD

Expressions, Aggregation and Replication with LOD Expressions, Levels of Details,

Working with Quick Table Calculations, the Creation of Calculated Fields, and

Predefined Calculations

Working with Parameters

How to Use Parameters, Using Parameters with Filters, Using Parameters in

Calculated Fields and Reference Line, etc.

Charts and Graphs

Creating Dual-axes Graphs, Histograms, Box Plot, Pareto, Funnel, Pie, Bar, Line,

Bubble, Bullet, Scatter, and Waterfall Charts, Creating Tree and Heat Maps, and

Implementing Market Basket Analysis (MBA)

Page 27: Big Data Analytics - Intellipaat...2020/10/08  · Big Data Analytics is the most in-demand and highly paid job profile in the market. In this course, you will master skills such as

Certification Program in Big Data Analytics 27 | P a g e

Dashboards and Stories

Building and Formatting a Dashboard Using Size, Objects, Views, Filters, and

Legends, Best Practices for Making Dashboards, Creating Stories, Publishing Data

Sources, Live vs Extract Connection, and Various File Types

Tableau Prep

Introduction to Tableau Prep, How Tableau Prep Helps Quickly Combine, Join,

Shape, and Clean Data for Analysis, Getting Deeper Insights into Data with Visual

Experience, Integrating Tableau Prep with Tableau Analytical Workflow, and

Understanding the Seamless Process from Data Preparation to Analysis with

Tableau Prep

Integration of Tableau with Hadoop and Spark

Connecting Tableau with Hadoop and Spark for Data Visualization

CASE STUDIES AND PROJECT WORK IN

BELOW DOMAINS

Learners will work on multiple case studies and project work in different

Domains as mentioned below:

Marketing, Web, and Social Media Analytics

Fraud and Risk Analytics

Supply Chain and Logistics Analytics

HR Analytics

Page 28: Big Data Analytics - Intellipaat...2020/10/08  · Big Data Analytics is the most in-demand and highly paid job profile in the market. In this course, you will master skills such as

Certification Program in Big Data Analytics 28 | P a g e

Certification After the completion of the course, students will get certificates from E&ICT, IIT Guwahati,

and IBM.

Page 29: Big Data Analytics - Intellipaat...2020/10/08  · Big Data Analytics is the most in-demand and highly paid job profile in the market. In this course, you will master skills such as

Certification Program in Big Data Analytics 29 | P a g e

Intellipaat Success Stories

Vishal Pentakota

Best part of this online course is the series of hands-on demonstrations the

trainer performed. Not only did he explain each concept theoretically, but also

implemented all those concepts practically. Great job. Must go for beginners.

Shreyashkumar Limbhetwala

I want to talk about the rich LMS that Intellipaat data science training offered.

The extensive set of PPTs, PDFs, and other related course material were of

the highest quality and due to this my learning with Intellipaat was excellent

and I could clear the Cloudera Data Scientist certification in the first attempt.

Giri Karnal

I had taken the Data Science masters’ program which is a combo of SAS, R

and Apache Mahout. Since there are so many technologies involved in the

Data Science course, getting your query resolved at the right time becomes

the most important aspect. But with Intellipaat, there was no such problem as

all my queries were resolved in less than 24 hours.

Sharath Reddy Yellapati

The course material was very well organized. The trainer

explained the basics of each module to me. All my queries were

addressed very clearly. The trainer also made me realize how

important this course is for beginners in IT stream.

Page 30: Big Data Analytics - Intellipaat...2020/10/08  · Big Data Analytics is the most in-demand and highly paid job profile in the market. In this course, you will master skills such as

Certification Program in Big Data Analytics 30 | P a g e

Contact Us

INTELLIPAAT SOFTWARE SOLUTIONS PVT. LTD

Bangalore

AMR Tech Park 3, Ground Floor, Tower B, Hongasandra Village, Bommanahalli, Hosur Road, Bangalore – 560068 USA

1219 E. Hillsdale Blvd. Suite 205, Foster City, CA 94404 If you have any further queries or just want to have a conversation with us, then do call us

IND: +91-7022374614 | US: 1-800-216-8930