Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Certification Program in Big Data Analytics 1 | P a g e
Certification in
Big Data Analytics In collaboration with IBM
Certification Program in Big Data Analytics 2 | P a g e
Table of Contents
1. About the Program
2. Partnering with E&ICT, IIT Guwahati
3. Collaborating with IBM
4. About Intellipaat
5. Key Features
6. Career Support Services
7. Eligibility Criteria & Application Process
8. Learning Path
9. Course Advisors
10. Program Curriculum
11. Certification
12. Success Stories
13. Contact Us
Certification Program in Big Data Analytics 3 | P a g e
About the Program
Big Data Analytics is the most in-demand and highly paid job profile in the market. In this
course, you will master skills such as statistics, data manipulation, data analysis, data
visualization, handling static and real-time large volumes of data, Big Data tools, etc. This
certification program in Big Data Analytics by E&ICT, IIT Guwahati, and Intellipaat is
created with an objective to provide high-end skills to working professionals and young
engineers, as well as to help them grow in their career.
This course will provide academic rigor and research elements, along with real-time
industry exposure, through case studies and project work. In this program, you will be
mentored by top industry experts, and sessions will be delivered by professors from top
universities and professionals from the industry.
This certification program is a blend of self-paced online videos, live virtual classes, hands-
on projects, and lab sessions. As part of this program, you will undergo 11 courses,
namely, Statistics for Analytics, Data Analytics Using Excel, Data Analytics Using SQL,
Python for Analytics, Basic Java & Linux, Big Data & Hadoop, Apache Spark, MongoDB,
Administrating Big Data Systems, Business Intelligence & Data Mining, and Tableau.
Certification Program in Big Data Analytics 4 | P a g e
Partnering with E&ICT, IIT Guwahati
This certification program in Big Data Analytics is in partnership with E&ICT Academy, IIT
Guwahati. E&ICT Academy, IIT Guwahati is an initiative of MeitY (Ministry of Electronics and
Information Technology, Govt. of India) and is formed with a team of IIT Guwahati professors
to provide high-quality education programs to working professionals.
Upon the completion of this program, you will:
Receive a joint certificate from E&ICT, IIT Guwahati, and Intellipaat
Have the alumni status of E&ICT, IIT Guwahati
Work on 30+ case studies
Complete a certification program from a top university
Have this program in collaboration with IBM
Gain industry-recognized certification from IBM
Have sessions from top professors and industry experts
Certification Program in Big Data Analytics 5 | P a g e
Collaborating with IBM IBM is one of the leading innovators and the biggest player in creating innovative Big Data
Analytics tools. Top subject matter experts from IBM will share knowledge in the domains
of Data Analytics and Big Data through this training program that will help you gain the
breadth of knowledge and industry experience.
Benefits for students from IBM:
Industry-recognized IBM certification
Access to IBM Watson for hands-on training and practice
Industry in-line case studies and project work
About Intellipaat
Intellipaat is one of the leading e-learning training providers with more than 600,000
learners across 53+ countries. We are on a mission to democratize education as we
believe that everyone has the right to quality education.
Our courses are delivered by subject matter experts from top MNCs, and our world-class
pedagogy enables quick learning of difficult topics in no time. Our 24/7 technical support
and career services will help learners jump-start their careers in their dream companies.
Certification Program in Big Data Analytics 6 | P a g e
Key Features
400+ HRS INSTRUCTOR
LED TRAINING 24/7 SUPPORT
20+ REAL TIME
CASE STUDIES & PROJECTS
CERTIFICATION PROGRAM
FROM A TOP UNIVERSITY
CERTIFICATION FROM IBM SESSIONS FROM TOP
PROFESSORS AND INDUSTRY
EXPERTS
CAREER SERVICES
EXECUTIVE ALUMNI STATUS
Certification Program in Big Data Analytics 7 | P a g e
Career Support
DEDICATED LEARNING MANAGER
Get mentored by experts, receive personalized feedback on your
performance, and clarify your doubts in no time
PERSONALIZED INDUSTRY MENTOR
We match your profile with the right industry mentor based on your
past skills. Your mentor’s guidance will help you get prepare yourself
MOCK INTERVIEWS
Mock interviews to make you prepare for cracking interviews by top
employers
GUARANTEED INTERVIEWS & JOB SUPPORT
Get interviewed by our 400+ hiring partners and enhance
your chances of getting placed
RESUME PREPARATION
Get assistance in creating a world-class resume from our
career services team
Certification Program in Big Data Analytics 8 | P a g e
Eligibility Criteria & the Application Process
Those wishing to enroll in this certification program in Big Data Analytics will be required to
follow the admission process mentioned below.
Eligibility Criteria
For the admission to the certification program in Big Data Analytics, candidates should:
Have a bachelor’s degree with an average of 50% or higher marks and a basic
understanding of programming concepts
Be working professionals with zeal to build a career in Big Data Analytics
Application Process
The application process consists of three simple steps. Candidates have to submit their
application. An offer of admission will be made to the selected candidates, and their
application will be accepted upon the payment of the admission fee.
SUBMIT APPLICATION
Tell us a bit about yourself and why you want to join this program
ADMISSION TEST & APPLICATION REVIEW
Clear the admission test and have a personal interview with our
interview panel
ADMISSION LETTER
Shortlisted candidates would be offered the admission letter
Certification Program in Big Data Analytics 9 | P a g e
Courses offered
1. MongoDB NoSQL and MS SQL Database
2. Statistics for Analytics
3. Data Processing Tools
4. Data Analytics Using Python
5. Hadoop and Its Ecosystems
6. Apache Spark and Scala
7. Python for Spark
8. Python for Spark: Functional and Object-Oriented Model
9. Tuning Models
10. Advanced Models / Ensemble Learning
11. Model Deployment
12. Apache Spark Framework and RDDs
13. PySpark SQL and Data Frames
14. PySpark Streaming
15. Introduction to PySpark Machine Learning
16. Streaming and Real-time Messaging Systems in Big Data
17. Administrating Big Data Systems
18. Business Intelligence and Data Mining
19. Data Visualization Using Tableau
1. Big Data Programming Prerequisites
INSTRUCTOR LED TRAINING COURSES
SELF PACED LEARNING
Certification Program in Big Data Analytics 10 | P a g e
Course Advisor
Muthusamy Manigandan
Head Engineering, Amazon India Mani comes with great experience in Algorithms, Data Science, Big Data, and
AI. He has worked on multiple research projects in the past on Data Science,
AI & ML for Display Advertising, and Recommendation and Classification
systems. He comes with more than 16 years of experience in building large-
scale AI products with top MNCs.
Diwakar Chittora
Co-founder & CEO, Intellipaat
He has more than 11 years of experience in developing large-scale BI
products for Fortune 500 companies. He also has great experience in doing
Data Analytics on large-scale data. In the past, he has worked in companies
such as Amex, Mercedes Benz Research, Pentaho, and Wipro.
David Callaghan
Big Data Strategist and Solutions Architect, Perficinet, USA
An experienced Blockchain professional, who has been bringing integrated
Blockchain, particularly Hyperledger and Ethereum, and Big Data solutions to
the cloud, David Callaghan has previously worked on Hadoop, AWS Cloud,
Big Data, and Pentaho projects that have had a major impact on the revenues
of marquee brands around the world.
Suresh Paritala
Solutions Architect at Microsoft, Texas
A Senior Software Architect at Microsoft, who has previously worked with
IBM Corporation, Suresh Paritala has worked on Big Data, Data Science,
Advanced Analytics, Internet of Things, and Azure, along with AI domains like
Machine Learning and Deep Learning. He has successfully implemented
high-impact projects in major corporations around the world.
Certification Program in Big Data Analytics 11 | P a g e
Pre-Requisite Skills
LINUX ADMINISTRATION Introduction to Linux
Introduction to Linux, Basics of Shell, Basics of Kernel, CentOS 8 installation and VBox additions,
Basic Linux Commands, ECHO and EXPR command, Set and unset a variable, Header of a shell
script (#!)
File Management
Text editors and file creation; Users, Groups and Processes; Root and Linux file hierarchy,
Understanding file hierarchy, Understanding file permissions, chmod and chown commands, the
LS command, Metacharacters, Editing a file using VIM, Displaying contents of a file, Copy, Move
and Remove files
Files and Processes
Everything is a file in UNIX/Linux (files, directories, executables, processes), Process control
commands (ps and kill), other process control tools (top, nice, renice)
Introduction to Shell Scripting
What is shell scripting, Types of shell, Creating and writing a shell script, Changing the permission
of the shell script, Executing the script, Environment variables, Defining a local and a global
variable, User input in a shell script
Conditional, Looping statements and Functions
What are Conditional statements, Using IF, IF-ELSE, Nested IF statements, What are Looping
statements, Using WHILE, UNTIL and FOR statements, Using the case…esac statement, What is
a Function, Creating a function in Linux, Calling functions
Text Processing
Using GREP command, Using SED command, Using AWK command, Mounting a file to the virtual
box, Creating a shared folder (mounting a folder), Using SORT command and Using pipes to
combine multiple Commands
Certification Program in Big Data Analytics 12 | P a g e
Scheduling Tasks
What are Daemons, Introduction to Task scheduling in Linux, Scheduling a job in Linux, What is
Cron and Crontab, How to use cron, Using the AT command
Linux Networking
What is networking in Linux, Using networking commands – IFCONFIG, PING, Wget and cURL,
SSH, SCP and FTP, Learning Firewall tools – iptables and firewalld, DNS and Resolving IP
address, nslookup and dig
Program Curriculum
STATISTICS FOR ANALYTICS
Introduction to Statistics
Why do we need Statistics?, Categories of Statistics, Statistical Terminology, Types of
Data, Measures of Central Tendency, Measures of Spread, Correlation, and Covariance,
Standardization and Normalization, Probability and the Types of Probability, Hypothesis
Testing, Chi-square Testing, ANOVA, Normal Distribution, and Binary Distribution
Logistic Regression
Introduction to Logistic Regression, Logistic Regression Concepts, Linear Regression vs
Logistic Regression, Math Behind Logistic Regression, Detailed Formulae, Logit Function
and Odds, Bivariate Logistic Regression, Poisson Regression, Building a Simple ‘Binomial’
Model and Predicting the Result, Confusion Matrix and Accuracy, True Positive Rate,
False Positive Rate, and the Confusion Matrix for Evaluating the Built Model, Threshold
Evaluation with ROCR, Finding the Right Threshold by Building the ROC Plot, Cross
Validation and Multivariate Logistic Regression, Building Logistic Models with Multiple
Independent Variables, and the Real-life Applications of Logistic Regression
Decision Trees and Random Forest
What is Classification?, Different Classification Techniques, Introduction to Decision Trees,
Algorithms for Decision Tree Induction, Building a Decision Tree in R, Creating a Perfect
Decision Tree, Confusion Matrix, Regression Trees vs Classification Trees, Introduction to
Certification Program in Big Data Analytics 13 | P a g e
the Ensemble of Trees and Bagging, Random Forest Concept, Implementing Random
Forest in R, What is Naive Bayes?, Computing Probabilities, Understanding the Concept of
Information Gain for the Right Split of Node, Impurity Function – Information Gain,
Understanding the Concept of Gini Index for the Right Split of Node, Impurity Function –
Gini Index, Understanding the Concept of Entropy for the Right Split of Node, Impurity
Function – Entropy, Overfitting and Pruning, Pre-pruning, Post-pruning, Cost-complexity
Pruning, Pruning a Decision Tree and Predicting Values, Finding the Right Number of
Trees, and Evaluating Performance Metrics
DATA PROCESSING TOOLS
Data Analysis Using Excel
Understanding the Concepts of Finance, Concepts of Economics, Hands-on: Inferential
Statistics, Descriptive Statistics, Simple and Multivariate Regression, and Confidence
Intervals
Data Analysis Using SQL
Master Concepts of MySQL, Working with MySQL and MySQL IDE – Installation and
Setup, Introduction to SQL Queries – DDL Queries (create and select) and DML Queries
(alter, insert, etc.), Working with Joins, Groups, Filtering Data, Writing Complex SQL
Queries for Data Retrieval, and the Import and Export of Data and Database Tables
Python for Analytics
Python Basics
Understanding Python Language, Basic Constructs, Advantages over Other
Languages, etc.
OOPs in Python
Understanding the OOP Paradigm, Access Modifiers, Instances, Class Members,
Classes and Objects, Function Parameter and Return Type Functions, and Lambda
Expressions
Certification Program in Big Data Analytics 14 | P a g e
NumPy for Mathematical Computing
Introduction to Mathematical Computing in Python, What are Arrays and Matrices?
Array Indexing, Array Math, Inspecting a NumPy Array, and NumPy Array
Manipulation
SciPy for Scientific Computing
SciPy Concepts, Characteristics of SciPy, Various Sub-packages such as Signal,
Integrate, Fftpack, Cluster, Optimize, Stats, and more, and Bayes Theorem with
SciPy
Data Manipulation
Understanding Data Manipulation in Python, Working with Pandas Library,
DataFrames, Merging Data Objects, Joins, and Cleaning and Visualizing Datasets
Data Visualization with Matplotlib
Understanding Visualization in Python, Plotting Graphs and Charts such as Scatter,
Bar, Pie, Line, Histogram, and more, Matplotlib API, Subplots, and Pandas’ built-in
Data Visualization
Supervised Learning
Understand Machine Learning and Its Types, Supervised Learning and Linear
Regression, Working with Classification Model and Writing Code, Decision Tree,
Confusion Matrix, Random Forest, Naïve Bayes Classifier, Support Vector Machine,
and XGBoost
Unsupervised Learning
Introduction to Unsupervised Learning, Different Types of Clustering - Exclusive,
Overlapping, and Hierarchical, Working with K-means Algorithm and Using Scikit-
Learn, Association Rule Mining, Understanding Market Basket Analysis and Apriori
Algorithm, and Measures in Association Rules
Python Integration with Spark
Introduction to PySpark, PySpark Installation, PySpark Use Cases, and Building
Applications Using PySpark
Certification Program in Big Data Analytics 15 | P a g e
Python for Spark: Functional and Object-Oriented Model
• Functions
• Lambda Functions
• Global Variables, its Scope, and Returning Values
• Standard Libraries
• Object-Oriented Concepts
• Modules Used in Python
• The Import Statements
• Module Search Path
• Package Installation Ways
Hands-On:
• Lambda – Features, Options, Syntax, Compared with the Functions
• Functions – Syntax, Return Values, Arguments, and Keyword Arguments
• Errors and Exceptions – Issue Types, Remediation
• Packages and Modules – Import Options, Modules, sys Path
• Apache Spark Framework and RDDs Preview
• Spark Components & its Architecture
• Spark Deployment Modes
• Spark Web UI
• Introduction to PySpark Shell
• Submitting PySpark Job
• Writing your first PySpark Job Using Jupyter Notebook
• What is Spark RDDs?
• Stopgaps in existing computing methodologies
• How RDD solve the problem?
• What are the ways to create RDD in PySpark?
• RDD persistence and caching
• General operations: Transformation, Actions, and Functions
• Concept of Key-Value pair in RDDs
• Other pair, two pair RDDs
• RDD Lineage
• RDD Persistence
Certification Program in Big Data Analytics 16 | P a g e
• WordCount Program Using RDD Concepts
• RDD Partitioning & How it Helps Achieve Parallelization
• Passing Functions to Spark
Hands-On:
• Building and Running Spark Application
• Spark Application Web UI
• Loading data in RDDs
• Saving data through RDDs
• RDD Transformations
• RDD Actions and Functions
• RDD Partitions
• WordCount program using RDD’s in Python
PySpark SQL and Data Frames
• Need for Spark SQL
• What is Spark SQL
• Spark SQL Architecture
• SQL Context in Spark SQL
• User-Defined Functions
• Data Frames
• Interoperating with RDDs
• Loading Data through Different Sources
• Performance Tuning
• Spark-Hive Integration
Hands-On:
• Spark SQL – Creating data frames
• Loading and transforming data through different sources
• Spark-Hive Integration
Certification Program in Big Data Analytics 17 | P a g e
Apache Kafka and Flume
• Why Kafka
• What is Kafka?
• Kafka Workflow
• Kafka Architecture
• Kafka Cluster Configuring
• Kafka Monitoring tools
• Basic operations
• What is Apache Flume?
• Integrating Apache Flume and Apache Kafka
Hands-On:
• Single Broker Kafka Cluster
• Multi-Broker Kafka Cluster
• Topic Operations
• Integrating Apache Flume and Apache Kafka
PySpark Streaming
• Introduction to Spark Streaming
• Features of Spark Streaming
• Spark Streaming Workflow
• StreamingContext Initializing
• Discretized Streams (DStreams)
• Input DStreams, Receivers
• Transformations on DStreams
• DStreams Output Operations
• Describe Windowed Operators and Why it is Useful
• Stateful Operators
• Vital Windowed Operators
• Twitter Sentiment Analysis
• Streaming using Netcat server
• WordCount program using Kafka-Spark Streaming
Certification Program in Big Data Analytics 18 | P a g e
Hands-On:
• Twitter Sentiment Analysis
• Streaming using Netcat server
• WordCount program using Kafka-Spark Streaming
• Spark-flume Integration
Introduction to PySpark Machine LearningPreview
• Introduction to Machine Learning- What, Why and Where?
• Use Case
• Types of Machine Learning Techniques
• Why use Machine Learning for Spark?
• Applications of Machine Learning (general)
• Applications of Machine Learning with Spark
• Introduction to MLlib
• Features of MLlib and MLlib Tools
• Various ML algorithms supported by MLlib
• Supervised Learning Algorithms
• Unsupervised Learning Algorithms
• ML workflow utilities
Hands-On:
• K- Means Clustering
• Linear Regression
• Logistic Regression
• Decision Tree
• Random Forest
Tuning Models
• Why model tuning?
• What is model tuning?
• What are parameters
• What are Hyper-parameters
• What is Hyper-parameter tuning?
• Types of Hyper parameter tuning:
Certification Program in Big Data Analytics 19 | P a g e
• Grid Search
• Random Search
Hands-On:
• Performing Grid Search Hyperparameter Tuning to Increase model accuracy
• Performing Random Search Hyperparameter Tuning to Increase model accuracy
Advanced Models / Ensemble Learning
• Why Ensemble Learning?
• What is Ensemble Learning?
• Model Error
• Bias
• Variance
• Reducing Model Error
• Different Types of Ensemble Learning
• Bagging
• Boosting
• Stacking
Hands-On:
• Creating a Bagging classifier to reduce model error using sklearn
• Creating a Boosting classifier to reduce model error using sklearn
• Creating a Stacking classifier to reduce model error using sklearn
Model Deployment
• What is Model Deployment
• Model Deployment Strategy
• Steps in Model Deployment
• Create a model
• Save it
• Load in in a web server/ web api
• Make Predictions
Hands-On:
• Saving and Deploying a model using a python falsk web api
Certification Program in Big Data Analytics 20 | P a g e
MASTERING BIG DATA ANALYTICS TOOLS
PROGRAMMING REFERSHER COURSE
Java Programming for MapReduce
Concepts of Object-oriented Programming, Understanding and Writing Code Using
Classes, Objects, Functions, Operators, etc., and Core Java Programming Required
for Writing MapReduce Code
Linux Fundamentals
Concepts of Object-oriented Programming, Understanding and Writing Code Using
Classes, Objects, Functions, Operators, etc., and Core Linux Programming
Required for Writing MapReduce Code
BIG DATA HADOOP AND ITS ECOSYSTEMS
Hadoop Installation and Setup
Setting up Hadoop Clustering on a Virtual Machine and the Different Components
Involved
Introduction to Big Data Hadoop, HDFS, and MapReduce
Concepts of Hadoop and Its Ecosystems and Understanding MapReduce, HDFS,
and YARN in Depth
Deep Dive into MapReduce
Working Mechanism of MapReduce, Various Stages in MR, Terminology in MR,
Working with Partitioners, Combiners, and Shuffle and Sort
Introduction to Hive
Introducing Hadoop Hive, Detailed Architecture of Hive, Comparing Hive with Pig
and RDBMS, Working with Hive Query Language, Creation of Databases, Table,
Group by, and Other Clauses, Various Types of Hive Tables, HCatalog, Storing Hive
Results, Hive partitioning, and Buckets
Certification Program in Big Data Analytics 21 | P a g e
Introduction to Pig
Introducing Pig, Detailed Architecture of Pig, Comparing Hive with Pig, Working
with Pig, Creation of Databases, and Table, Group by, and Other Clauses
Flume & Sqoop
Concepts of Apache Sqoop, Importing and Exporting Data, Performance
Improvement with Sqoop, and Understanding the Concepts of Flume and Its
Architecture
APACHE SPARK AND SCALA
Mastering Scala Programming
What is Scala? Its Use Cases, Advantages over Other Languages, Writing Your
First Scala Code, Working with Classes, Objects, and Functions, Bobsrockets
Package and Comparing Between Mutable and Immutable Collections, Scala REPL,
Lazy Values, Control Structures in Scala, and Directed Acyclic Graph (DAG)
Understanding Spark Framework
What is Apache Spark? Its Various Features, Comparing with Hadoop, Combining
HDFS with Spark, and Spark Architecture
RDDs in Spark
Understanding Spark RDD Operations, Comparison of Spark with MapReduce,
Spark Transformation, Loading Data into Spark, Types of RDD Operations viz.
Transformation and Action, and What is a Key/Value Pair?
DataFrames and Spark SQL
Detailed Understanding of Spark SQL, the Significance of SQL in Spark for Working
with Structured Data Processing, Spark SQL JSON Support, Working with XML Data
and Parquet Files, Creating Hive Context, Writing a DataFrame to Hive, How to
Read a JDBC File?, Significance of a Spark DataFrame, How to Create a
DataFrame? What is Schema Manual Inferring? How to Work with CSV files? JDBC
Table Reading, and Data Conversion from DataFrame to JDBC
Certification Program in Big Data Analytics 22 | P a g e
Spark SQL
Spark SQL User-defined Functions, Shared Variables and Accumulators, How to
Query and Transform Data in DataFrames? How Does a DataFrame Provide the
Benefits of Both Spark RDD and Spark SQL? and Deploying Hive on Spark as the
Execution Engine
Machine Learning Using Spark (MLlib)
Introduction to Spark MLlib, Understanding Various Algorithms, Iterative Algorithm in
Spark, Spark Graph Processing Analysis, Introducing Machine Learning, K-means
Clustering, Spark Variables such as Shared and Broadcast Variables, What are
Accumulators? Various ML Algorithms Supported by MLlib, Linear Regression,
Logistic Regression, Decision Tree, Random Forest, and Building a
Recommendation Engine Using Spark
STREAMING AND REAL-TIME MESSAGING IN
BIG DATA
Apache Flume and Apache Kafka
Why Kafka? What is Kafka? Kafka Architecture, Kafka Workflow, Configuring Kafka
Clusters, Basic Operations, Kafka Monitoring Tools, and Integrating Apache Flume
and Apache Kafka
Spark Streaming
Introduction to Spark Streaming, Its Architecture and Features, Writing a Spark
Streaming Program, Processing Data Using Spark Streaming, Requesting Count
and DStream, Multi-batch and Sliding-window Operations, Working with Advanced
Data Sources, Spark Streaming Workflow, Initializing the Streaming Context,
Discretized Streams (DStreams), Input DStreams and Receivers, Transformations
on DStreams, Output Operations on DStreams, Windowed Operators and Why It Is
Useful, and Important Windowed Operators and Stateful Operators
Certification Program in Big Data Analytics 23 | P a g e
NOSQL DATABASE – MONGODB
Introduction to NoSQL Databases and Their Importance
Understanding NoSQL Database, CAP Theorem, Different Types of Distributed
Databases, Querying Without SQL, Introduction to JSON, Querying Using JSON,
and ACID properties
Introduction to NoSQL and MongoDB
What is RDBMS? Challenges of RDBMS, NoSQL Database, Its Significance, How
NoSQL Suits Big Data Needs, Introduction to MongoDB and Its Advantages, JSON
Features, Data Types, and Examples
MongoDB Installation
Installing MongoDB, Basic MongoDB Commands and Operations, MongoChef
(MongoGUI) Installation, and MongoDB Data Types
Working with MongoDB
Base Property, JSON/BSON, MongoDB Write Concern Acknowledged, Replica
Acknowledged, Unacknowledged, Journaling, and Fsync
CRUD Operations
Understanding CRUD and Its Functionality, CRUD Concepts, MongoDB Query and
Syntax, Read and Write Queries, and Query Optimization
Data Modeling and Schema Design
Concepts of Data Modeling, Difference Between MongoDB and RDBMS Modeling,
Model Tree Structure, Operational Strategies, and Monitoring and Backup
Data Management and Administration
MongoDB Administration Activities: Health Check, Backup, Recovery, Database
Sharding and Profiling, Data Import/Export, Performance Tuning, etc.
Certification Program in Big Data Analytics 24 | P a g e
Data Indexing and Aggregation
Concepts of Data Aggregation and Types, Data Indexing Concepts, Properties, and
Variations
MongoDB Security
Understanding Database Security Risks, MongoDB Security Concept and Security
Approach, and MongoDB Integration with Java and Robomongo
Working with Unstructured Data
Implementing Techniques to Work with a Variety of Unstructured Data such as
Images, Videos, Log Data and Others and Understanding GridFS MongoDB File
System for Storing Data
ADMINISTRATING BIG DATA SYSTEMS
Creation of Multi-node Cluster Setup Using Amazon EC2
Creating a 4-node Hadoop Cluster Setup on Amazon EC2, Running MapReduce
Jobs on the Hadoop Cluster, and Mastering the Cloudera Manager Setup
Cluster Configuration
Cluster Configuration Settings, Various Parameters and Values of Configuration,
HDFS Parameters and MapReduce Parameters, Include and Exclude Configuration
Files, the Administration and Maintenance of the NameNode, DataNode Directory
Structures and Files, and Working with the Edit Log.
Maintenance, Monitoring, and Troubleshooting
How to Maintain, Monitor, and Troubleshoot Hadoop Cluster? Checkpoint
Procedure, How to Handle NameNode Failure? How to Ensure the Recovery
Procedure? Safe Mode, Metadata, and Data Backup, Understanding Potential
Problems and Solutions, and What to Look for and How to Add and Remove Nodes?
Implementing Security Using Kerberos
Concepts of Security and How to Use Kerberos to Provide Robust Security to
Hadoop Cluster
Certification Program in Big Data Analytics 25 | P a g e
BUSINESS INTELLIGENCE AND DATA MINING
Data Warehousing and Data Mining
Understanding Data Warehousing, Use Cases and Applications, What is Data
Mining? How to Use Big Data Systems in Data Warehousing Environment
Creating Data Models for Large Data Warehouses
Different Types of Data Models - Star, Snowflake, and Hybrid - Which is the Right
Model for Handling Large Datasets?
Integration of Hadoop and Spark with ETL Tool
How does the Informatica ETL tool work? Building Workflows Using Informatica for
Integration with HDFS, Hive, MapReduce, etc., and Performance Tuning of ETL
Systems for Processing Large Datasets
DATA VISUALIZATION
Introduction to Data Visualization and Power of Tableau
Different Types of Data Visualization Techniques, Comparison and Their Benefits,
Reading Raw Numbers Using Tableau, Tableau Interface, and Working with Data
Sources
Architecture of Tableau
Installation of Tableau Desktop, Architecture of Tableau, Interface of Tableau
(Layout, Toolbars, Data Pane, Analytics Pane, etc.), and Share and Export in
Tableau
Working with Metadata and Data Blending
Connection to Excel, Cubes and PDFs, Management of Metadata and Extracts,
Working with Joins (Left, Right, Inner, and Outer) and Union, Dealing with NULL
Values, Cross-database Joining, Data Blending, Refresh Extraction, and Incremental
Extraction
Certification Program in Big Data Analytics 26 | P a g e
Creation of Sets
Mark, Highlight, Sort, Group, and Use Sets (Creating and Editing Sets, IN/OUT, Sets
in Hierarchies), Constant Sets, Computed Sets, Bins, etc.
Working with Filters
Filters (Addition and Removal), Filtering Continuous Dates, Dimensions, and
Measures, Interactive Filters, Marks Card, Hierarchies, How to Create Folders in
Tableau, Sorting in Tableau, Types of Sorting, Filtering in Tableau, Types of Filters,
Filtering the Order of Operations, etc.
Organizing Data and Visual Analytics
Using Formatting Pane to Work with Menu, Fonts, Alignments, Settings, Labels and
Tooltips, Axes and Annotations, K-means Cluster Analysis, Trend and Reference
Lines, Forecasting, Confidence Interval, Reference Lines, and Bands
Working with Mapping
Working on Coordinate Points, Plotting Longitude and Latitude, Editing
Unrecognized Locations, Customizing Geocoding, Polygon Maps, Web Mapping
Services (WMS), Working with Images, Map Visualization, Custom Territories, Map
Box, and WMS Map
Working with Calculations and Expressions
Working with Calculations in Tableau, How to Use Syntax and Functions, LOD
Expressions, Aggregation and Replication with LOD Expressions, Levels of Details,
Working with Quick Table Calculations, the Creation of Calculated Fields, and
Predefined Calculations
Working with Parameters
How to Use Parameters, Using Parameters with Filters, Using Parameters in
Calculated Fields and Reference Line, etc.
Charts and Graphs
Creating Dual-axes Graphs, Histograms, Box Plot, Pareto, Funnel, Pie, Bar, Line,
Bubble, Bullet, Scatter, and Waterfall Charts, Creating Tree and Heat Maps, and
Implementing Market Basket Analysis (MBA)
Certification Program in Big Data Analytics 27 | P a g e
Dashboards and Stories
Building and Formatting a Dashboard Using Size, Objects, Views, Filters, and
Legends, Best Practices for Making Dashboards, Creating Stories, Publishing Data
Sources, Live vs Extract Connection, and Various File Types
Tableau Prep
Introduction to Tableau Prep, How Tableau Prep Helps Quickly Combine, Join,
Shape, and Clean Data for Analysis, Getting Deeper Insights into Data with Visual
Experience, Integrating Tableau Prep with Tableau Analytical Workflow, and
Understanding the Seamless Process from Data Preparation to Analysis with
Tableau Prep
Integration of Tableau with Hadoop and Spark
Connecting Tableau with Hadoop and Spark for Data Visualization
CASE STUDIES AND PROJECT WORK IN
BELOW DOMAINS
Learners will work on multiple case studies and project work in different
Domains as mentioned below:
Marketing, Web, and Social Media Analytics
Fraud and Risk Analytics
Supply Chain and Logistics Analytics
HR Analytics
Certification Program in Big Data Analytics 28 | P a g e
Certification After the completion of the course, students will get certificates from E&ICT, IIT Guwahati,
and IBM.
Certification Program in Big Data Analytics 29 | P a g e
Intellipaat Success Stories
Vishal Pentakota
Best part of this online course is the series of hands-on demonstrations the
trainer performed. Not only did he explain each concept theoretically, but also
implemented all those concepts practically. Great job. Must go for beginners.
Shreyashkumar Limbhetwala
I want to talk about the rich LMS that Intellipaat data science training offered.
The extensive set of PPTs, PDFs, and other related course material were of
the highest quality and due to this my learning with Intellipaat was excellent
and I could clear the Cloudera Data Scientist certification in the first attempt.
Giri Karnal
I had taken the Data Science masters’ program which is a combo of SAS, R
and Apache Mahout. Since there are so many technologies involved in the
Data Science course, getting your query resolved at the right time becomes
the most important aspect. But with Intellipaat, there was no such problem as
all my queries were resolved in less than 24 hours.
Sharath Reddy Yellapati
The course material was very well organized. The trainer
explained the basics of each module to me. All my queries were
addressed very clearly. The trainer also made me realize how
important this course is for beginners in IT stream.
Certification Program in Big Data Analytics 30 | P a g e
Contact Us
INTELLIPAAT SOFTWARE SOLUTIONS PVT. LTD
Bangalore
AMR Tech Park 3, Ground Floor, Tower B, Hongasandra Village, Bommanahalli, Hosur Road, Bangalore – 560068 USA
1219 E. Hillsdale Blvd. Suite 205, Foster City, CA 94404 If you have any further queries or just want to have a conversation with us, then do call us
IND: +91-7022374614 | US: 1-800-216-8930