Upload
benjamin-taylor
View
189
Download
3
Tags:
Embed Size (px)
DESCRIPTION
This presentation covers data science buzz words, big data introduction, predictive analytics, and model building methods. Structured vs unstructured. Supervised learning vs unsupervised learning.
Citation preview
Ben Taylor @bentaylordata
Predictive Analytics / Data Science
Presentation Objectives
• Enable you to be smarter than your prospect (data history / lingo)
• Motivate you to be unstoppable and hyper-confident
• Motivate you to begin looking for data driven opportunities
• Motivate you to become a data scientist
"What the hell is cloud computing?"-Larry Ellison, CEO Oracle
What is cloud computing?
?
What is big data?
Big data includes datasets or problems which exceed the capacity of a single computer and require a distributed data access system.
The concept of "big" is relative to the conventional systems and technology and is subject to change in the future with advances in memory and storage solutions.
http://www.pcmag.com/article2/0,2817,2453838,00.asp
Big data trends
What is a data scientist?
What is a data scientist?
Engineering Finance Economics Mathematics Computer Science Physics
Data Science6-10yrs
Python Bootcamp $8,000 (3 months)
$16,000-$4,000 (3 months)
$115K avg
What is a data scientist?
What is a data scientist?
Master Builder
What is a data scientist?
Reality distortion: Hyper-confidence
Data Scientist = Peacock
@bentaylordata
Humans Algorithms
VS
Smartest pirate
Humans Algorithms
VS
NA
Humans Algorithms
VSGerman (1795), French (1806)
Humans Algorithms
VS
1997, IBM deep blue
Kasparov
Humans Algorithms
VS
2011, IBM Watson
Ken Jennings & Brad Rutter
Humans Algorithms
VS
2014, HireVue Iris
Hiring Panel
Prediction process
Raw data
Data munging
Training
Model
Data munging
Prediction process
Raw data
Feature selection
Training
Model
Data cleaning
Clean data
Numeric Excel example
@bentaylordata
Data munging
Prediction process
Raw data
Feature selection
Training
Model
Data cleaning
LSR, SVM, RANDOM FOREST,NAÏVE BAYESIAN, NEURAL NET
Missing values + categorical
@bentaylordata
Data munging
Prediction process
Raw data
Feature selection
Training
Model
Data cleaning
LSR, SVM, RANDOM FOREST,NAÏVE BAYESIAN, NEURAL NET
Retail > 15, Engineering > 95
> 5.67
Resume model
Resume model
Data munging
Prediction process
Raw data
Feature selection
Training
Model
Data cleaning
LSR, SVM, RANDOM FOREST,NAÏVE BAYESIAN, NEURAL NET
Retail > 15, Engineering > 95GPA, Colleges, Hobbies
> 5.67
Text deeper dive
Sentiment example
Sentiment example
Sentiment
Given data, find cat? dog?
@bentaylordata
Talk like a data nerd
@bentaylordata
Confidence & Over-fitting
Confidence & Over-fitting
Data Lingo Supervised vs unsupervised learning
Supervised: Training set provided.
Unsupervised: No training set, clustering based on similar attributes.
Data Lingo Analytic Layers
Descriptive Analytics: Telling a data story, plotting, or visualization.
Predictive Analytics: Predict future outcomes, usually trained on a historical training set
Prescriptive Analytics: Using the insight from your predictive model to proactively change something
Interview/Interaction Analytics: Any analytics surrounding the interview or interaction.
Data Lingo Prediction methods
Regression: Predicting a continuous output (stock)
Classification: Predicting discrete category outputs. i.e. Yes/Maybe/No
Data Lingo
Data Types Structured: Does it play well in Excel?
Unstructured: Raw text (Twitter), audio, video, photos, resumes, etc…