Upload
menish-gupta
View
5.965
Download
0
Embed Size (px)
Citation preview
© 2015 Hudson Data Corp. All Rights Reserved. www.bitbootcamp.com
2
AlgorithmsThe Brains
– Introduction to Data Science
– Data Munging & Fusion– Text Mining
•Naïve Bayes– Recommendation Engines– Principal Component
Analysis– Classification
•Decision Trees• Random Forest•Gradient Boosting Machines
– Generalized Linear Models– Clustering
• KNN• K-Means
– Graph Theory– Stable Marriage
HadoopBig Data
CoreEngineering
Our Training OfferingsSkills you need
© 2015 Hudson Data Corp. All Rights Reserved. www.bitbootcamp.com
3Training OverviewEvening Classes
Big DataBig Data Track1Big Data
Big Data Track 2Machine Learning
© 2015 Hudson Data Corp. All Rights Reserved. www.bitbootcamp.com
4Big Data Training4 week intensive big data Evening Classes
Week 1 Week 2 Week 3 Week 4 Self Study
CertificationsComplete the industry
standard Hadoop certification
For Data Science
Track1
© 2015 Hudson Data Corp. All Rights Reserved. www.bitbootcamp.com
5Machine Learning Training6 Week Data Science Evening Classes
Week 1 Week 2 Week 3 Week 4 Week 5 Week 6
Introduction to Machine Learning
Recommendation EnginesCollaborative Filtering
Gradient Boosting Machines
For Data Science
Data Fusion and Fuzzy Matching
Principal Component Analysis Graph Theory
and Stable Marriage
Generalized Linear Models
Linear Regression
RegularizationLogistic
Regression
Decision Trees
Text Mining
Naive Bayes
Random Forests
ClusteringKnnK-Means
Data Aggregation Project Data Science Project Career Counseling
Track2
Big DataBig Data Track1Big Data
© 2015 Hudson Data Corp. All Rights Reserved. www.bitbootcamp.com
7Big Data Training4 week intensive big data training
Week 1 Week 2 Week 3 Week 4 Self Study
CertificationsComplete the industry
standard Hadoop certification
For Data Science
Track1
© 2015 Hudson Data Corp. All Rights Reserved. www.bitbootcamp.com
8
Week 1
Introductions• Motivation for Big Data• Unix for Data Science• Pushing and Pulling data from remote
servers• Columnar Compressions• Extended Data Dictionary
Monday - 6:30 PM Wednesday - 6:30 PM
Pulling and Processing Data• SQL overview• SQL design patterns for data analytics
o Pivot Tableso Aggregationo Network Analysis
Unix Assignments• Process data in parallel• Working with remote
Machines
SQL Assignments
Big Data TrainingMaster the basics
• Five key design patterns• Joins, Aggregation, Temp Tables,
Indexes, Functions
© 2015 Hudson Data Corp. All Rights Reserved. www.bitbootcamp.com
9
Cluster Setup• Introduction to Big Data Ecosystem• Acquire 5 machines in AWS• Prepare machines for Hadoop• Setup 5 – 10 Node Cluster• Say Hello to Hadoop
Monday - 6:30 PM Wednesday - 6:30 PM
Introduction Hadoop • Motivation for Hadoop• HDFS• ETL in Hadoop with large dataset• SQOOP• OOZIE• Hadoop Streaming
Cluster Setup Assignment• Setup Cluster in cloud• Develop automation scripts
ETL In Hadoop
Big Data TrainingSpin up the cluster
• N Gram data in Hadoop• Develop ETL jobs in cluster
Week 2
© 2015 Hudson Data Corp. All Rights Reserved. www.bitbootcamp.com
10
Hive• Motivation for hive• Hive architecture• Aggregation and data selection• Hive and Python Integration
Monday - 6:30 PM Wednesday - 6:30 PM
Advanced Hive• Hive Jobs and Variables• Custom Functions• Custom data types• Indexing and Performance issues
Hive Assignment• Data aggregation
Hive Assignment 2
Big Data TrainingWrangle millions of records in Hadoop
• N Gram data in Hadoop• Develop ETL jobs in cluster
Week 3
© 2015 Hudson Data Corp. All Rights Reserved. www.bitbootcamp.com
11
Hadoop Map Reduce• Motivation for Map Reduce• Map Reduce in action• Map Reduce API• Splitter and Combiners• Custom data format
Monday - 6:30 PM Wednesday - 6:30 PM
Advanced Map Reduce• Distributed Joins• Data Compression in Map Reduce• Optimizations• Debugging and Tracing
M/R Assignment• Data aggregation• Extended Data Dictionaries
M/R Assignment 2
Big Data TrainingHadoop under the hood with Map Reduce
• N Gram data in Hadoop• Develop ETL jobs in cluster
Week 4
Big Data Track 2Machine Learning
© 2015 Hudson Data Corp. All Rights Reserved. www.bitbootcamp.com
13Machine Learning Training6 Week Data Science Evening Classes
Week 1 Week 2 Week 3 Week 4 Week 5 Week 6
Introduction to Machine Learning
Recommendation EnginesCollaborative Filtering
Gradient Boosting Machines
For Data Science
Data Fusion and Fuzzy Matching
Principal Component Analysis Graph Theory
and Stable Marriage
Generalized Linear Models
Linear Regression
RegularizationLogistic
Regression
Decision Trees
Text Mining
Naive Bayes
Random Forests
ClusteringKnnK-Means
Data Aggregation Project Data Science Project Career Counseling
Track2
© 2015 Hudson Data Corp. All Rights Reserved. www.bitbootcamp.com
14
Week 1
Introduction to ML & Unix• Motivation for Machine Learning (ML)• Geometric , Probabilistic and Logical
Models • Standardized ML Model lifecycle• Unix for Data Science• Pushing and Pulling data from remote
servers• Extended Data Dictionary
Tuesday - 6:30 PM Thursday - 6:30 PM
Python for Data Science• Thinking in Python• Python design patterns for data analytics• Pandas• Data Frames• Aggregations• Scripting in Python
1. Unix Assignments• Data Processing in UNIS• Data Processing in parallel• Working with remote
machines
2. SQL & Python Assignments
Machine Learning
• Data Processing in Python• Data Processing in SQL
© 2015 Hudson Data Corp. All Rights Reserved. www.bitbootcamp.com
15
Tuesday - 6:30 PM Thursday - 6:30 PM
3. Titanic Survivors• Who is most likely to survive the
Titanic disaster?
4. Classify recipes by ingredients
Machine Learning
• Analyzing ingredients to identify the origin of cuisine
• Data munging
Week 2
Decision Trees• Motivation for Decision Trees• ID3, C4.5 and CART• Entropy, Information Gain• Pruning and Purging• Trees in Actions
Text Mining / Naïve Bayes• Motivation for Text Mining• Working with unstructured datasets• Tokenization and Standardization of text• Naïve Bayes• Applications and Results
© 2015 Hudson Data Corp. All Rights Reserved. www.bitbootcamp.com
16
Recommendation Engine• Motivation for recommendation Engines• Sparse Matrices operations• Manhattan Distance, Euclidean Distance,
Cosine Distance • Similarity Matrices and results
Tuesday - 6:30 PM Thursday - 6:30 PM
5. Predict Customer Churn• Data Munging• Telecom customer churn model
development• Validate the model
6. Collaborative Filter
Machine Learning
• Identify similar analytical topics based on hacker news feed
Week 3
Random Forest• Motivation for Random Forest• Vote by democracy / Variable Importance• Random Forest in Action• Industry Use Cases
© 2015 Hudson Data Corp. All Rights Reserved. www.bitbootcamp.com
17
Tuesday - 6:30 PM Thursday - 6:30 PM
Principal Component Analysis• Motivation for Principal Component Analysis• Curse of dimensionality• Best Practices for dimensionality reduction• Use cases and applications
7. GBM Assignment• Data Munging• Telecom customer churn model development• Compare Random Forest and GBM
8. Image processing
Machine Learning
• Reduce dimensions in image data
• Classify images in categories
Week 4
Gradient Boosting Machines (GBM)• Motivation for GBM• Boosting vs. Bagging• Residual error and tree generations• Metrics Search for best GBM Trees• GBM in action• Industry Use cases
© 2015 Hudson Data Corp. All Rights Reserved. www.bitbootcamp.com
18
Tuesday - 6:30 PM Thursday - 6:30 PM
9. Regression models• Predict housing pricing• Data munging
10. Clustering
Machine Learning
• Clustering around flower types
Week 5
Generalized Linear Models• Linear Regression• Regularization ( Ridge, Lasso )• Logistic Regression• Generalized Linear Models• Feature Selections• Industry Use Case
Clustering : Knn & K-means• Motivation for Un-supervised learning
methods• Intuition behind Knn and Applications• Intuition behind K-Means and Applications• Multi class classification• Hierarchical Clustering
© 2015 Hudson Data Corp. All Rights Reserved. www.bitbootcamp.com
19
Graph Theory and Stable Marriage• Master key graph theory metrics• Bi-partite graphs• Visualizing graph with Gephi• Motivations for matching algorithms with
preferences• Preferences with both parties• Incomplete List and Ties• Industry Use cases
Tuesday - 6:30 PM Thursday - 6:30 PM
11. Data Fusion • Fuzzy matching on Names and
Address• Data Munging
12. Graph Theory, Stable Marriage
Machine Learning
• Determine stable pairs between two groups based on preferences
Week 6
Data Fusion and Fuzzy Matching• Merging data sets from multiple sources• Probabilistic and Deterministic Matching• String Fuzzy Matching
- Edit Distances, Jaro Winkler Distance• Fuzzy Address Matching• Swap-in / Swap-out analysis• Industry Use Cases
© 2015 Hudson Data Corp. All Rights Reserved. www.bitbootcamp.com
20
[email protected] 917-819-0106 www.bitbootcamp.com25 BroadwaySuite 1032
New York, NY
Contact UsMade in NYC