71
What is Data Science? What is Data Science?

What is Data Science? - UCSC Astro Grads Wiki

  • Upload
    others

  • View
    12

  • Download
    0

Embed Size (px)

Citation preview

What is Data Science?What is Data Science?

Why should I care about Data Science?

Seth+2009

Seth+2009

The majority of people in this room will not get a tenure track faculty position!

The Sexiest Job of the 21st Century-Harvard Business Review

Taken from Insight Data Fellows White Paper

The Sexiest Job of the 21st Century-Harvard Business Review

Taken from Insight Data Fellows White Paper

The Sexiest Job of the 21st Century-Harvard Business Review

Taken from Insight Data Fellows White Paper

The Sexiest Job of the 21st Century-Harvard Business Review

Taken from Insight Data Fellows White Paper

The Sexiest Job of the 21st Century-Harvard Business Review

Taken from Insight Data Fellows White Paper

The Sexiest Job of the 21st Century-Harvard Business Review

Taken from Insight Data Fellows White Paper

The Sexiest Job of the 21st Century-Harvard Business Review

Taken from Insight Data Fellows White Paper

So what is Data Science?

So what is a Data Scientist?

So what is a Data Scientist?

Someone who knows more stats than a computer scientist and more programming than a statistician

So what is a Data Scientist?

Someone who knows more stats than a computer scientist and more programming than a statisticianAnd who can use a bunch

of data to ask and answer the right questions

Slide Taken From Scott Nicholson

Credit: Hilary Mason

● Asking and answering the right questions

● Discovering what we don't know from data

● Obtaining Predictive actionable insights from data

● Creating Data products that have business impact now

● Communicating relevant business stories from data

● Building Confidence in decision that drive business value

Some Points Taken from Carlos Somohano's Talk

What is Data Science?What is Data Science?

Using Data to create actionable insights and productsUsing Data to create actionable insights and products

Examples of Data ScienceExamples of Data Science

● Google's Search Algorithm & Advertisements

● Helping People find products, movies, news stories, music, … they want

● Amazon, ebay, Pandora, Netflix, Yahoo, ...● Linkedin

● Helping people find a job● Facebook

● Helping people interact with their friends● Using past patient data to improve diagnosis or alert

doctors to a patient's particular needs

● Helping people not die

Examples of Data ScienceExamples of Data Science

● Figuring what your project should be about

● Is it doable? Will it have an impact?● And actually doing it

● Implementing the code to perform the analysis that you need to do

● And communicating it

● Via presentations, plots, or text

Examples of Data ScienceExamples of Data Science

● Figuring what your project should be about

● Is it doable? Will it have an impact?● And actually doing it

● Implementing the code to perform the analysis that you need to do

● And communicating it

● Via presentations, plots, or text

Sound Familiar yet?

Examples of Data ScienceExamples of Data Science

● Classifying interesting targets from an image from background stars

● Using observed luminosities in bands (or a spectrum) to determine a galaxy's age, mass, and/or metallicity

● Automated Galaxy morphological classification

● PCA of anything

● Fitting Curves

● Any application of machine learning

● Even if you don't know those words you have probably used these methods without realizing it

Day to Day Life of a Data ScientistDay to Day Life of a Data Scientist

Day to Day Life of a Data ScienstistDay to Day Life of a Data Scienstist

The Same as what you are doing now!

Day to Day Life of a Data ScienstistDay to Day Life of a Data Scienstist

The Same as what you are doing now!

Coding

Learning New Techniques

Reading about the field

Working in a collaborative team

Working with Cool Data

Present your work, listen to talks

Group Meetings

● Interesting Problems

● Collaborative Environment

● Working with Data

● Analysis

● Figuring out the Right Questions to Ask

● Creativity

● Sharing my work with others

Why I Love AstronomyWhy I Love Astronomy

● Interesting Problems

● Collaborative Environment

● Working with Data

● Analysis

● Figuring out the Right Questions to Ask

● Creativity

● Sharing my work with others

Why I Love AstronomyWhy I Love Astronomy

● Interesting Problems

● Collaborative Environment

● Working with Data

● Analysis

● Figuring out the Right Questions to Ask

● Creativity

● Sharing my work with others

Why I Love Data ScienceWhy I Love Data ScienceWhy I Love Data ScienceWhy I Love Data Science

● Interesting Problems

● Collaborative Environment

● Working with Data

● Analysis

● Figuring out the Right Questions to Ask

● Creativity

● Sharing my work with others

Why I Love AstronomyWhy I Love Astronomy

● Interesting Problems

● Collaborative Environment

● Working with Data

● Analysis

● Figuring out the Right Questions to Ask

● Creativity

● Sharing my work with others

● Large Impact

● Staying in the Bay Area

● Security to Start a Family

Why I Love Data ScienceWhy I Love Data Science

● Interesting Problems

● Collaborative Environment

● Working with Data

● Analysis

● Figuring out the Right Questions to Ask

● Creativity

● Sharing my work with others

Why I Love AstronomyWhy I Love Astronomy

● Interesting Problems

● Collaborative Environment

● Working with Data

● Analysis

● Figuring out the Right Questions to Ask

● Creativity

● Sharing my work with others

● Large Impact

● Staying in the Bay Area

● Security to Start a Family

● Money

Why I Love Data ScienceWhy I Love Data Science

What does a Data Science Job Look Like?What does a Data Science Job Look Like?

SalarySalary 80k – 120k + 5% stock + 5% bonus

WithoutPhd

WithPhd

What does a Data Science Job Look Like?What does a Data Science Job Look Like?

SalarySalary 80k – 120k + 5% stock + 5% bonus

LocationLocation San Francisco, SoMa, Bay Area, everywhere

Start-upStart-up

● Opportunity to grow ● Bigger Risk/Higher Reward● More exciting● More ownership of your contribution● More/cross responsibilities● Longer hours

EstablishedEstablished

● Security● Harder to move up to high levels● Perks can include

● 5 months maternity/paternity leave● Free lunch from Michelin Chef● Bike/Car Maintenance● Dry Cleaning● As Many Vacation Days as You Want

HoursHours 40 hr normally; can be 60 hr at crunch time

What does a Data Science Job Look Like?What does a Data Science Job Look Like?

SalarySalary 80k – 120k + 5% stock + 5% bonus

LocationLocation San Francisco, SoMa, Bay Area, everywhere

Start-upStart-up

● Opportunity to grow ● Bigger Risk/Higher Reward● More exciting● More ownership of your contribution● More/cross responsibilities● Longer hours

EstablishedEstablished

● Security● Harder to move up to high levels● Perks can include

● 5 months maternity/paternity leave● Free lunch from Michelin Chef● Bike/Car Maintenance● Dry Cleaning● As Many Vacation Days as You Want

HoursHours 40 hr normally; can be 60 hr at crunch time

What does a Data Science Job Look Like?What does a Data Science Job Look Like?

SalarySalary 80k – 120k + 5% stock + 5% bonus

LocationLocation San Francisco, SoMa, Bay Area, everywhere

Start-upStart-up

● Opportunity to grow ● Bigger Risk/Higher Reward● More exciting● More ownership of your contribution● More/cross responsibilities● Longer hours

EstablishedEstablished

● Security● Harder to move up to high levels● Perks can include

● 5 months maternity/paternity leave● Free lunch from Michelin Chef● Bike/Car Maintenance● Dry Cleaning● As Many Vacation Days as You Want

HoursHours 40 hr normally; can be 60 hr at crunch time

Key Differences From AstronomyKey Differences From Astronomy

● Not working on Astronomical problems

● Astronomy is slow to progress

● Papers take 6-12 months

● Data Science projects will be days, weeks, or months

– 80/20 rule: 80% of results with 20% of the work

– Don't find the “right” answer, find a “good enough” answer● Data Science projects can easily impact >100 million people compared to

roughly 50

● More social than astronomy

● More collaborative within a team

● Work with people outside your team and your realm of expertise

– Designer, product manager, etc.● Strong Emphasis on Machine Learning

What is Machine Learning

What is Machine Learning

A branch of artificial intelligenceconcerned with the construction

of systems that can learn from data

What is Machine Learning

A branch of artificial intelligenceconcerned with the construction

of systems that can learn from dataIf you ever fit a line

you have “machine learned”!

What is Machine Learning

A branch of artificial intelligenceconcerned with the construction

of systems that can learn from dataIf you ever fit a line

you have “machine learned”!

Being Lazy. Lettingthe computer write your

programs for you

● Supervised Learning

● Useful Methods

– Dimensionality Reduction– Greedy Search / Gradient Descent / Optimization– Regularization (aka setting a prior)– Model Ensembles / Bagging / Stacking– Neural Networks

● Regression

– Neural Networks– Least Squares

● Classification

What is Machine Learning?What is Machine Learning?

- Support Vector Machines- Rule Induction

- Bayesian Classifiers- Logistic Regression

● Supervised Learning

● Useful Methods

– Dimensionality Reduction– Greedy Search / Gradient Descent / Optimization– Regularization (aka setting a prior)– Model Ensembles / Bagging / Stacking– Neural Networks

● Regression

– Neural Networks– Least Squares

● Classification

What is Machine Learning?What is Machine Learning?

- Support Vector Machines- Rule Induction

- Bayesian Classifiers- Logistic Regression

Given Examples of (x; f(x));

Predict the function f(x)for new examples of x

x and f(x)can be of any dimension

(x; f(x))

f(x)

x0

● Supervised Learning

● Regression

– Symbolic– Least Squares

● Classification

● Natural Language Processing

● Instance Based Learning

– Recommender Systems– K-nearest Neighbors

What is Machine Learning?What is Machine Learning?

- Support Vector Machines- Rule Induction

- Bayesian Classifiers- Logistic Regression

Symbolic Regression with Genetic AlgorithmsSymbolic Regression with Genetic Algorithms

- Support Vector Machines- Rule Induction

- Bayesian Classifiers- Logistic Regression

● Supervised Learning

● Regression

– Symbolic– Least Squares

● Classification

● Natural Language Processing

● Instance Based Learning

– Recommender Systems– K-nearest Neighbors

What is Machine Learning?What is Machine Learning?

- Bayesian Classifiers- Logistic Regression

- Support Vector Machines- Rule Induction- Decision Trees

Machine Learning

Machine Learning

Male6

Poor

Machine Learning

Male6

Poor

Female24

Rich

Machine Learning

Male6

Poor

Female24

Rich

Male36

Rich

Machine Learning

Male6

Poor

Female24

Rich

Male26

Poor

Male36

Rich

Female66

Poor

Male6

Poor

Female12

Rich

Female36

Poor

Male36

Poor

Machine Learning

Male6

Poor

Female24

Rich

Male26

Poor

Male36

Rich

Female66

Poor

Male6

Poor

Female12

Rich

Female36

Poor

Male36

Poor

Male18

Rich

Machine Learning

Male6

Poor

Female24

Rich

Male26

Poor

Male36

Rich

Female66

Poor

Male6

Poor

Female12

Rich

Female36

Poor

Male36

Poor

Male18

Rich ???

Machine Learning

Male6

Poor

Female24

Rich

Male26

Poor

Male36

Rich

Female66

Poor

Male6

Poor

Female12

Rich

Female36

Poor

Male36

Poor

Male18

Rich ???

Male18

Rich

Male18

Rich

???

Machine Learning

Male6

Poor

Female24

Rich

Male26

Poor

Male36

Rich

Female66

Poor

Male6

Poor

Female12

Rich

Female36

Poor

Male36

Poor

Male18

Rich

Male18

Rich

Machine Learning

Male6

Poor

Female24

Rich

Male26

Poor

Male36

Rich

Female66

Poor

Male6

Poor

Female12

Rich

Female36

Poor

Male36

Poor

Male18

Rich

Male18

Rich

We do the Same thing in Astronomy!

Machine Learning

SmallRed25

SmallRed24

BigRed17

SmallBlue14

BigBlue17

SmallRed22

BigRed20

SmallRed21

SmallRed22

SmallBlue15

SmallBlue15

We do the Same thing in Astronomy!

● Supervised Learning

● Regression

– Symbolic– Least Squares

● Classification

● Natural Language Processing

● Instance Based Learning

– K-nearest Neighbors– Recommender Systems

What is Machine Learning?What is Machine Learning?

- Support Vector Machines- Rule Induction- Decision Trees

- Bayesian Classifiers- Logistic Regression

Will I Like This Movie?Will I Like This Movie?

???

● Unsupervised Learning

● Clustering

– K-means clustering– Social Network Analysis

● Useful Methods

– Dimensionality Reduction; PCA– Greedy Search / Gradient Descent / Optimization– Regularization (aka setting a prior)– Model Ensembles / Bagging / Stacking– Neural Networks

What is Machine Learning?What is Machine Learning?

Applesauce For

Vegetable Oil???

Applesauce ForVegetable Oil???

● Unsupervised Learning

● Clustering

– K-means clustering– Social Network Analysis

● Useful Methods

● Dimensionality Reduction; PCA● Greedy Search / Gradient Descent / Optimization● Regularization (aka setting a prior)● Model Ensembles / Bagging / Stacking● Neural Networks

What is Machine Learning?What is Machine Learning?

What should I do if I am interested in

Data Science?

What should I do if I am interested in

Data Science?

Start Working NOW!

● Don't Use IDL or Fortran!!!

● Python, R, Java instead● Slug's Guide to Getting a job

● Email me for access● Coursera

● Machine Learning Courses● Introduction to Data Science

● Data Science Meet-ups

● Kaggle

How to Prepare for Data ScienceHow to Prepare for Data Science

● AMS Designated Emphasis Requirements

● AMS 203: Introduction to Probability Theory

● AMS 206: Classical Bayesian Inference

● AMS 207: Intermediate Bayesian Statistics

● AMS 256: Linear Statistical Models

● One Elective From

● 205B: Intermediate Classical Inference

● 206B: Intermediate Bayesian Inference

● 221: Bayesian Decision Theory

● 223: Time Series Analysis

● 225: Multivariate Statistical Methods

● 241: Bayesian Nonparametric Methods

● 245: Spatial Statistics

● 274: Generalized Linear Models

How to Prepare for Data ScienceHow to Prepare for Data Science

Theory etc.

- 261: Probability Theory

with Markov Chains

- 263: Stochastic Processes

- 291: Advanced Topics in

Bayesian Statsistics

● Resume Workshop● Machine Learning Seminar

What Do You Want Next?What Do You Want Next?

Email Me for Access to [email protected]

or for any questions