20

BitBootCamp Evening Classes

Embed Size (px)

Citation preview

Page 1: BitBootCamp Evening Classes
Page 2: BitBootCamp Evening Classes

© 2015 Hudson Data Corp. All Rights Reserved. www.bitbootcamp.com

2

AlgorithmsThe Brains

– Introduction to Data Science

– Data Munging & Fusion– Text Mining

•Naïve Bayes– Recommendation Engines– Principal Component

Analysis– Classification

•Decision Trees• Random Forest•Gradient Boosting Machines

– Generalized Linear Models– Clustering

• KNN• K-Means

– Graph Theory– Stable Marriage

HadoopBig Data

CoreEngineering

Our Training OfferingsSkills you need

Page 3: BitBootCamp Evening Classes

© 2015 Hudson Data Corp. All Rights Reserved. www.bitbootcamp.com

3Training OverviewEvening Classes

Big DataBig Data Track1Big Data

Big Data Track 2Machine Learning

Page 4: BitBootCamp Evening Classes

© 2015 Hudson Data Corp. All Rights Reserved. www.bitbootcamp.com

4Big Data Training4 week intensive big data Evening Classes

Week 1 Week 2 Week 3 Week 4 Self Study

CertificationsComplete the industry

standard Hadoop certification

For Data Science

Track1

Page 5: BitBootCamp Evening Classes

© 2015 Hudson Data Corp. All Rights Reserved. www.bitbootcamp.com

5Machine Learning Training6 Week Data Science Evening Classes

Week 1 Week 2 Week 3 Week 4 Week 5 Week 6

Introduction to Machine Learning

Recommendation EnginesCollaborative Filtering

Gradient Boosting Machines

For Data Science

Data Fusion and Fuzzy Matching

Principal Component Analysis Graph Theory

and Stable Marriage

Generalized Linear Models

Linear Regression

RegularizationLogistic

Regression

Decision Trees

Text Mining

Naive Bayes

Random Forests

ClusteringKnnK-Means

Data Aggregation Project Data Science Project Career Counseling

Track2

Page 6: BitBootCamp Evening Classes

Big DataBig Data Track1Big Data

Page 7: BitBootCamp Evening Classes

© 2015 Hudson Data Corp. All Rights Reserved. www.bitbootcamp.com

7Big Data Training4 week intensive big data training

Week 1 Week 2 Week 3 Week 4 Self Study

CertificationsComplete the industry

standard Hadoop certification

For Data Science

Track1

Page 8: BitBootCamp Evening Classes

© 2015 Hudson Data Corp. All Rights Reserved. www.bitbootcamp.com

8

Week 1

Introductions• Motivation for Big Data• Unix for Data Science• Pushing and Pulling data from remote

servers• Columnar Compressions• Extended Data Dictionary

Monday - 6:30 PM Wednesday - 6:30 PM

Pulling and Processing Data• SQL overview• SQL design patterns for data analytics

o Pivot Tableso Aggregationo Network Analysis

Unix Assignments• Process data in parallel• Working with remote

Machines

SQL Assignments

Big Data TrainingMaster the basics

• Five key design patterns• Joins, Aggregation, Temp Tables,

Indexes, Functions

Page 9: BitBootCamp Evening Classes

© 2015 Hudson Data Corp. All Rights Reserved. www.bitbootcamp.com

9

Cluster Setup• Introduction to Big Data Ecosystem• Acquire 5 machines in AWS• Prepare machines for Hadoop• Setup 5 – 10 Node Cluster• Say Hello to Hadoop

Monday - 6:30 PM Wednesday - 6:30 PM

Introduction Hadoop • Motivation for Hadoop• HDFS• ETL in Hadoop with large dataset• SQOOP• OOZIE• Hadoop Streaming

Cluster Setup Assignment• Setup Cluster in cloud• Develop automation scripts

ETL In Hadoop

Big Data TrainingSpin up the cluster

• N Gram data in Hadoop• Develop ETL jobs in cluster

Week 2

Page 10: BitBootCamp Evening Classes

© 2015 Hudson Data Corp. All Rights Reserved. www.bitbootcamp.com

10

Hive• Motivation for hive• Hive architecture• Aggregation and data selection• Hive and Python Integration

Monday - 6:30 PM Wednesday - 6:30 PM

Advanced Hive• Hive Jobs and Variables• Custom Functions• Custom data types• Indexing and Performance issues

Hive Assignment• Data aggregation

Hive Assignment 2

Big Data TrainingWrangle millions of records in Hadoop

• N Gram data in Hadoop• Develop ETL jobs in cluster

Week 3

Page 11: BitBootCamp Evening Classes

© 2015 Hudson Data Corp. All Rights Reserved. www.bitbootcamp.com

11

Hadoop Map Reduce• Motivation for Map Reduce• Map Reduce in action• Map Reduce API• Splitter and Combiners• Custom data format

Monday - 6:30 PM Wednesday - 6:30 PM

Advanced Map Reduce• Distributed Joins• Data Compression in Map Reduce• Optimizations• Debugging and Tracing

M/R Assignment• Data aggregation• Extended Data Dictionaries

M/R Assignment 2

Big Data TrainingHadoop under the hood with Map Reduce

• N Gram data in Hadoop• Develop ETL jobs in cluster

Week 4

Page 12: BitBootCamp Evening Classes

Big Data Track 2Machine Learning

Page 13: BitBootCamp Evening Classes

© 2015 Hudson Data Corp. All Rights Reserved. www.bitbootcamp.com

13Machine Learning Training6 Week Data Science Evening Classes

Week 1 Week 2 Week 3 Week 4 Week 5 Week 6

Introduction to Machine Learning

Recommendation EnginesCollaborative Filtering

Gradient Boosting Machines

For Data Science

Data Fusion and Fuzzy Matching

Principal Component Analysis Graph Theory

and Stable Marriage

Generalized Linear Models

Linear Regression

RegularizationLogistic

Regression

Decision Trees

Text Mining

Naive Bayes

Random Forests

ClusteringKnnK-Means

Data Aggregation Project Data Science Project Career Counseling

Track2

Page 14: BitBootCamp Evening Classes

© 2015 Hudson Data Corp. All Rights Reserved. www.bitbootcamp.com

14

Week 1

Introduction to ML & Unix• Motivation for Machine Learning (ML)• Geometric , Probabilistic and Logical

Models • Standardized ML Model lifecycle• Unix for Data Science• Pushing and Pulling data from remote

servers• Extended Data Dictionary

Tuesday - 6:30 PM Thursday - 6:30 PM

Python for Data Science• Thinking in Python• Python design patterns for data analytics• Pandas• Data Frames• Aggregations• Scripting in Python

1. Unix Assignments• Data Processing in UNIS• Data Processing in parallel• Working with remote

machines

2. SQL & Python Assignments

Machine Learning

• Data Processing in Python• Data Processing in SQL

Page 15: BitBootCamp Evening Classes

© 2015 Hudson Data Corp. All Rights Reserved. www.bitbootcamp.com

15

Tuesday - 6:30 PM Thursday - 6:30 PM

3. Titanic Survivors• Who is most likely to survive the

Titanic disaster?

4. Classify recipes by ingredients

Machine Learning

• Analyzing ingredients to identify the origin of cuisine

• Data munging

Week 2

Decision Trees• Motivation for Decision Trees• ID3, C4.5 and CART• Entropy, Information Gain• Pruning and Purging• Trees in Actions

Text Mining / Naïve Bayes• Motivation for Text Mining• Working with unstructured datasets• Tokenization and Standardization of text• Naïve Bayes• Applications and Results

Page 16: BitBootCamp Evening Classes

© 2015 Hudson Data Corp. All Rights Reserved. www.bitbootcamp.com

16

Recommendation Engine• Motivation for recommendation Engines• Sparse Matrices operations• Manhattan Distance, Euclidean Distance,

Cosine Distance • Similarity Matrices and results

Tuesday - 6:30 PM Thursday - 6:30 PM

5. Predict Customer Churn• Data Munging• Telecom customer churn model

development• Validate the model

6. Collaborative Filter

Machine Learning

• Identify similar analytical topics based on hacker news feed

Week 3

Random Forest• Motivation for Random Forest• Vote by democracy / Variable Importance• Random Forest in Action• Industry Use Cases

Page 17: BitBootCamp Evening Classes

© 2015 Hudson Data Corp. All Rights Reserved. www.bitbootcamp.com

17

Tuesday - 6:30 PM Thursday - 6:30 PM

Principal Component Analysis• Motivation for Principal Component Analysis• Curse of dimensionality• Best Practices for dimensionality reduction• Use cases and applications

7. GBM Assignment• Data Munging• Telecom customer churn model development• Compare Random Forest and GBM

8. Image processing

Machine Learning

• Reduce dimensions in image data

• Classify images in categories

Week 4

Gradient Boosting Machines (GBM)• Motivation for GBM• Boosting vs. Bagging• Residual error and tree generations• Metrics Search for best GBM Trees• GBM in action• Industry Use cases

Page 18: BitBootCamp Evening Classes

© 2015 Hudson Data Corp. All Rights Reserved. www.bitbootcamp.com

18

Tuesday - 6:30 PM Thursday - 6:30 PM

9. Regression models• Predict housing pricing• Data munging

10. Clustering

Machine Learning

• Clustering around flower types

Week 5

Generalized Linear Models• Linear Regression• Regularization ( Ridge, Lasso )• Logistic Regression• Generalized Linear Models• Feature Selections• Industry Use Case

Clustering : Knn & K-means• Motivation for Un-supervised learning

methods• Intuition behind Knn and Applications• Intuition behind K-Means and Applications• Multi class classification• Hierarchical Clustering

Page 19: BitBootCamp Evening Classes

© 2015 Hudson Data Corp. All Rights Reserved. www.bitbootcamp.com

19

Graph Theory and Stable Marriage• Master key graph theory metrics• Bi-partite graphs• Visualizing graph with Gephi• Motivations for matching algorithms with

preferences• Preferences with both parties• Incomplete List and Ties• Industry Use cases

Tuesday - 6:30 PM Thursday - 6:30 PM

11. Data Fusion • Fuzzy matching on Names and

Address• Data Munging

12. Graph Theory, Stable Marriage

Machine Learning

• Determine stable pairs between two groups based on preferences

Week 6

Data Fusion and Fuzzy Matching• Merging data sets from multiple sources• Probabilistic and Deterministic Matching• String Fuzzy Matching

- Edit Distances, Jaro Winkler Distance• Fuzzy Address Matching• Swap-in / Swap-out analysis• Industry Use Cases

Page 20: BitBootCamp Evening Classes

© 2015 Hudson Data Corp. All Rights Reserved. www.bitbootcamp.com

20

[email protected] 917-819-0106 www.bitbootcamp.com25 BroadwaySuite 1032

New York, NY

Contact UsMade in NYC