9
12/24/2015 (9) What classes should I take if I want to become a data scientist? Quora https://www.quora.com/WhatclassesshouldItakeifIwanttobecomeadatascientist 1/9 +1 Write Answer Follow 409 Comment Share 3 Downvote What classes should I take if I want to become a data scientist? Given the reported talent gap for data scientists ( http://www.emc.com/collateral/ab... ) how should universities and industry be training people? More specific versions of this question for particular universities. (feel free to add yours!) What classes should I take at Berkeley if I want to become a data scientist? What classes should I take at Brown if I want to become a data scientist? What classes should I take at Caltech if I want to become a data scientist? What classes should I take at CMU if I want to become a data scientist? What classes should I take at Cornell if I want to become a data scientist? What classes should I take at Duke if I want to become a data scientist? What classes should I take at Georgia Tech if I want to become a data scientist? What classes should I take at Harvard if I want to become a data scientist? What classes should I take at MIT if I want to become a data scientist? What classes should I take at Princeton if I want to become a data scientist? What classes should I take at Stanford if I want to become a data scientist? What classes should I take at UCLA if I want to become a data scientist? What classes should I take at the University of Chicago if I want to become a data scientist? What classes should I take at UT Austin if I want to become a data scientist? What classes should I take at Yale if I want to become a data scientist? What classes should I take at IISc (Indian Institute of Science) if I want to become a data scientist? View More What classes should I take at the University of Chicago if I want to become a data scientist? Is it too late to become a data scientist if I'm taking my first probability class as a freshman? What classes should I take in my last semester as a math major/aspiring data scientist? What classes should I take at Berkeley if I want to become a data scientist? What classes should I take at UCLA if I want to become a data scientist? I want to be a data scientist. I am pursuing a bachelor's in computer science. What should I do after my graduation to bec... What classes should I take at Stanford if I want to become a data scientist? What classes should I take at Harvard if I want to become a data scientist? What major should I choose if I want to be a data scientist? What classes should I take at Princeton if I want to become a data scientist? More Related Questions Learning Data Science Jobs and Careers in Data Science Data Science ReAsk Answer Wiki 26 Answers A data science curriculum should mostly a combination of Statistics and Computer Science classes, with additional relevant classes from other departments (e.g. Applied Math, Math, Econ) Here are my suggestions on a full curriculum for a data science program William Chen, Data Scientist at Quora 22.2k Views • Upvoted by Sean Owen, Director, Data Science @ Cloudera William is a Most Viewed Writer in Data Science. Question Overview 409 Followers including Joe Blitzstein , Professor in the Harvard Statistics Department • Ryan Fox Squire In FAQ for Data Science at Universities 41,024 Views Related Questions Joe Blitzstein Professor in the Harvard Statistics Department 50,723 30 Day Views 1,564,719 All Time Views Followed by Marc Bodnick, Vladimir Novakovski, and 5 others you follow Top Writer 2015 and 2014 Most Viewed in Statistics, Probability, Harvard Stat 110, and 9 more Subscribe Follow 15.9k Ask Question Ask or Search Quora Read Answer Notifications Pallav 3 9

(9) What Classes Should I Take if I Want to Become a Data Scientist_ - Quora

Embed Size (px)

DESCRIPTION

hvhvhhvhv

Citation preview

Page 1: (9) What Classes Should I Take if I Want to Become a Data Scientist_ - Quora

12/24/2015 (9) What classes should I take if I want to become a data scientist? ­ Quora

https://www.quora.com/What­classes­should­I­take­if­I­want­to­become­a­data­scientist 1/9

+1

Write Answer Follow 409 Comment Share 3 Downvote

What classes should I take if I want to become a datascientist?Given the reported talent gap for data scientists (http://www.emc.com/collateral/ab... ) howshould universities and industry be training people?

More specific versions of this question for particular universities. (feel free to add yours!)

What classes should I take at Berkeley if I want to become a data scientist?What classes should I take at Brown if I want to become a data scientist?What classes should I take at Caltech if I want to become a data scientist?What classes should I take at CMU if I want to become a data scientist?What classes should I take at Cornell if I want to become a data scientist?What classes should I take at Duke if I want to become a data scientist?What classes should I take at Georgia Tech if I want to become a data scientist?What classes should I take at Harvard if I want to become a data scientist?What classes should I take at MIT if I want to become a data scientist?What classes should I take at Princeton if I want to become a data scientist?What classes should I take at Stanford if I want to become a data scientist?What classes should I take at UCLA if I want to become a data scientist?What classes should I take at the University of Chicago if I want to become a datascientist?What classes should I take at UT Austin if I want to become a data scientist?What classes should I take at Yale if I want to become a data scientist?What classes should I take at IISc (Indian Institute of Science) if I want to become adata scientist?

View More

What classes should I take at the University ofChicago if I want to become a data scientist?

Is it too late to become a data scientist if I'm takingmy first probability class as a freshman?

What classes should I take in my last semester as amath major/aspiring data scientist?

What classes should I take at Berkeley if I want tobecome a data scientist?

What classes should I take at UCLA if I want tobecome a data scientist?

I want to be a data scientist. I am pursuing abachelor's in computer science. What should I doafter my graduation to bec...

What classes should I take at Stanford if I want tobecome a data scientist?

What classes should I take at Harvard if I want tobecome a data scientist?

What major should I choose if I want to be a datascientist?

What classes should I take at Princeton if I want tobecome a data scientist?

More Related Questions

Learning Data Science Jobs and Careers in Data Science Data Science

Re­Ask

Answer Wiki

26 Answers

A data science curriculum should mostly a combination of Statistics and Computer Scienceclasses, with additional relevant classes from other departments (e.g. Applied Math, Math,Econ)

Here are my suggestions on a full curriculum for a data science program

William Chen, Data Scientist at Quora22.2k Views • Upvoted by Sean Owen, Director, Data Science @ ClouderaWilliam is a Most Viewed Writer in Data Science.

Question Overview

409 Followers including Joe Blitzstein, Professorin the Harvard Statistics Department • Ryan FoxSquire

In FAQ for Data Science at Universities

41,024 Views

Related Questions

Joe BlitzsteinProfessor in the Harvard StatisticsDepartment

50,72330 Day Views

1,564,719All Time Views

Followed by Marc Bodnick, Vladimir Novakovski,and 5 others you follow

Top Writer 2015 and 2014

Most Viewed in Statistics, Probability,Harvard Stat 110, and 9 more

SubscribeFollow 15.9k

Ask QuestionAsk or Search Quora Read Answer Notifications Pallav3 9

Page 2: (9) What Classes Should I Take if I Want to Become a Data Scientist_ - Quora

12/24/2015 (9) What classes should I take if I want to become a data scientist? ­ Quora

https://www.quora.com/What­classes­should­I­take­if­I­want­to­become­a­data­scientist 2/9

Upvote Downvote Comments 8+ Share 6

IntroductionOne year of multivariable calculus and linear algebra / matrix algebraOne year of intro CSOne year of intro probability and inference

Core ClassesData scienceMachine learningLinear modelingPredictive modeling

Stats electivesMore linear modelsTime series analysisStatistical softwareExperimental designSurvey analysisCausal inferenceBayesian data analysisNonparametric methods

CS electivesTheory of computation / Analysis of algorithmsData structures and algorithmsSoftware engineeringVisualizationParallel programming / Massive computation (for processing huge datasets)Network analysisMore machine learningEconomics + Computer Science (game theory, auction design)

Other electives(convex) OptimizationBehavioral economics

Thank you for those in the comments for suggesting more comments to add to the list!

For my answer on what major you should be if you want to be a data scientist, check outWhat major should I choose if I want to be a data scientist?

This answer is part of What is the Data Science topic FAQ?

Updated Feb 3 • View Upvotes

287

First, there's a difference between developing data products to be consumed by peopleversus those consumed by other machines. But I'm assuming you mean the former, sothat's what I'll talk about.

There are a lot of great answers here, but I just want to highlight a few aspects that don'tget nearly as much attention as they should.

Causality: Dr. Anonymous below deserves more upvotes. At the end of the day.your audience wants actionable information. If you don't give it to them, you havefailed (more on this below). We're in a situation now where data scientists dopredictive modeling (based on historical data), decision makers action based onthat, and the result is...well, no one knows. That action was never in the model.Conversational skills: Read Guy Cuthbert's answer. He points out a veryimportant, and woefully neglected, set of skills, namely being able to have aconversation with non­specialists. I wrote about the importance of this in detail inmy answer to Mark Meloon's answer to What is a data scientist's career path?Most data scientists who claim to be "great communicators" are merely skilled at

Mark Meloon, Senior Data Scientist at Impetus2.4k Views • Upvoted by Ryan Fox Squire, Neuroscientist Turned Data ScientistMark has 30+ answers in Data Science.

pal
Highlight
pal
Highlight
pal
Highlight
Page 3: (9) What Classes Should I Take if I Want to Become a Data Scientist_ - Quora

12/24/2015 (9) What classes should I take if I want to become a data scientist? ­ Quora

https://www.quora.com/What­classes­should­I­take­if­I­want­to­become­a­data­scientist 3/9

Upvote Downvote Comment Share

one­way information transfer, such as writing and presenting. That doesn't cut itin data science for reasons I explain. Guy's suggestion of Rhetoric and CognitivePsychology is right on the money. I have yet to see a university or MOOC doanything but pay lip service to this critical aspect of data science.Understand Data: Great answers by Alex Leavitt and Adam Marcus.Mathematician John Allen Paulos has a great book entitled "Innumeracy" thatdetails just how poorly most people understand probabilities and othermathematical concepts (see Synopses of Innumeracy, Math and Humor, and HisOther Books ). You've got to be able to grok all this at a deep level.

And now for non­coursework, it is a very good idea to do some projects of your owninterest to demonstrate your initiative, passion for the subject, and that you are a self­starter. Note that a class project doesn't count (see Data Science Interview: I Don't CareAbout Your Class Projects ). Personally, I don't particularly care about Kagglecompetitions either (see Mark Meloon's answer to How useful are Kaggle competitions forgetting interviews for someone already working as a data scientist?). I'm much moreinterested in projects of your own design and those that demonstrate your ability to workwell in a team environment.

There's more, of course, but the other commenters on this page have done an excellent jobof covering those. My bio is "Data Science: the straight, no­hype truth" and I feelcompelled to point out that data science is far more than sitting in front of your computerall day, geeking out on using the most sophisticated algorithm you can think of to spit upresults.

Finally, go for it! Data science is way cool and there really is nothing quite like it. Thecriticism that it's merely a sexed­up version of statistics is way off. Yeah, training to becomeone has unique challenges, but it'll be worth it in the end.

And keep asking questions on Quora. There's a slew of extremely knowledgable peoplehere who are very eager to help!

­Mark

Written Jan 31 • View Upvotes

12

I could only tell you what I did till now and what I intend to work on additionally tobecome a better data Scientist.

What follows is my own Data science Curriculum. This is aimed at ComputerScience with a Specialization in Machine Learning.

My main aim here is to learn about Mathematics, Statistics, Computer Science andMachine Learning, though not necessarily in the same order.

I have categorized the courses here as of two types:

1. F ­ Foundational Class2. A ­ Advanced Specialization

MATHEMATICS:

(F1) Linear Algebra By Gilbert Strang:

A Great Class by a great Teacher. I Would definitely recommend this class to anyone whowants to learn LA.

(F2) Multivariate Calculus ­ MIT OCW: TODO

COMPUTER SCIENCE:

(F1) CS50x: Introduction to Computer Science, Harvard

This is an Introduction to Computer Science class taken by David Malan. Helped me with

Rahul Agarwal, Data Scientist at Citi2.7k Views • Upvoted by Ryan Fox Squire, Neuroscientist Turned Data ScientistRahul is a Most Viewed Writer in Big Data.

pal
Highlight
pal
Highlight
pal
Line
pal
Line
Page 4: (9) What Classes Should I Take if I Want to Become a Data Scientist_ - Quora

12/24/2015 (9) What classes should I take if I want to become a data scientist? ­ Quora

https://www.quora.com/What­classes­should­I­take­if­I­want­to­become­a­data­scientist 4/9

many misunderstandings and helped build intuition around the whole CS playground.Starts with a basic introduction to C and some programming exercises. Ends up teachingbasics of PHP, Javascript and HTML/CSS as well. The projects in this class are reallyawesome. The github code repository for this class is at HERE

(F2) CS101x : MITx introduction to programming using Python:

The course is an introduction to many of the important concepts in computer science.

Talks about simple algorithms, Asymptotic times, Classes, OOP, Trees, Exceptions,Assertions, Hashing and a whole lot of other stuff.

(F3) Algorithms and Data Structures ­ MIT OCW: CURRENTLY Working on

(F4) RICE University : Comp Sci Mini Specialization ­

This is a series of 6 short but good courses. I worked on these courses as Data sciencewill require you to do a lot of programming. And the best way to learn programming is bydoing programming. The lectures are good but the problems and assignments areawesome. It consists of three main courses:

1> Interactive Programming in Python : The Course starts with teaching Python butsuddenly moves into creating graphical user interfaces and games using python incodeskulptor. I created some very basic games in this course as part of the coursework.Some of them are:

Guess The NumberStopWatchPongMemoryBlackJackRiceRocks

2> Principles of Computing : This course adds on to the previous course but here thefocus is more on thinking programmatically rather than GUIs. The projects are really greatas the course progresses with creating games.

Solitaire Mancala2048Tic Tac Toe Using Monte CarloYahtzeeCookie ClickerZombie ApocalypseWord WranglerTic Tac Toe Using MinimaxFifteen Puzzle

3> Algorithmic Thinking: This course starts with a focus on graph algorithms and datastructures. The codes are sourced at Github

STATISTICS:

(F1) Stat 110: Introduction to Probability: Joe Blitzstein ­ Harvard University

Conditioning is the Soul of Statistics.

I took this course to enhance my understanding of probability distributions and statistics,but this course taught me a lot more than that. Apart from Learning to thinkconditionally, this also taught me how to explain difficult concepts with a story.

This was a Hard Class but definitely fun. The focus was not only on gettingMathematical proofs but also on understanding the intuition behind them and howintuition can help in deriving them more easily.Sometimes the same proof was done indifferent ways to facilitate learning of a concept.

One of the things I liked most about this course is the focus on concrete examples whileexplaining abstract concepts. The inclusion of Gambler’s Ruin Problem, MatchingProblem, Birthday Problem, Monty Hall, Simpsons Paradox, St. PetersbergParadox etc. made this course much much more exciting than a normal StatisticsCourse.

pal
Highlight
pal
Highlight
pal
Pencil
pal
Pencil
Page 5: (9) What Classes Should I Take if I Want to Become a Data Scientist_ - Quora

12/24/2015 (9) What classes should I take if I want to become a data scientist? ­ Quora

https://www.quora.com/What­classes­should­I­take­if­I­want­to­become­a­data­scientist 5/9

I will definitely be on a lookout for more courses by Joe after this and I have already doneone more course by him ­ CS109. More on that later.

The Top 10 Ideas covered in this class are:

1. Probability, Conditioning is the soul of Statistics, Story Proofs2. Bayes Theorem, Law of Total Probability, First Step Analysis.3. Expectation and Variance for discrete RVs and continuous RVs. LOTUS.4. Discrete (Bernoulli, Binomial, Hypergeometric, Geometric, Negative Binomial, FS,

Poisson) and Continuous (Uniform, Normal, expo, Beta, Gamma) Distributionsand the stories behind them.

5. Moment Generating Functions(MGF’s) and their Properties6. Joint and Marginal distributions, Covariance and Correlation7. Convolutions and Transformations8. Conditional Expectation ­ Adam and Eve Law9. Law of Large Numbers and CLT10. Markov Chains

Solving the problem sets and the midterm reviews helped me a lot in grasping the abstactconcepts.

(F2) Stat 111 : TODO

Uses Degroot and Schervish for instruction. No lecture videos available so I plan to readthe book and Complete Problem Sets Online from the Stat111 website. I so wish the lectureswere there.

(A1) Bayesian Statistics STAT 544: TODO

A lecture Series on Bayesian statistics by Jarad Niemi at ISU.

(A2) Discrete Stochastic Processes MIT OCW: TODO

Got highly interested in Probability after STAT 110 so added this here. It is an alternative toone of the next courses to take after STAT 110 that Professor Joe Blitzstein talks about inthe course apart from STAT 111.

MACHINE LEARNING:

(F1) MITx The Analytics Edge:

This is a fantastic course for learning about R as well as the implementations of variousmachine learning algorithm in R. Very Basic. Very Crisp and very informative. Thescenarios and examples range from Moneyball to Watson. The only problem with thiscourse is that it’s problem sets feel a little repetitive.

Here is the location of my R code repository for this course

(F2) Intro to Data Science ­University of Washington

My first ML Class. It took a little bit long to grasp the concepts but in hindsght it might bebecause of my lack of exposure to the material. It was my first grapple with tools like Rand Python. Covers a whole lot of base from R to Python to Mapreduce. Would put it hereas it gives a thorough perspective of the whole data science space.

(F3) Data Science CS109 : ­ Again by Professor Blitzstein. Again an awesome course.Watch it after Stat110 as you will be able to understand everything much better with athorough grinding in Stat110 concepts. You will learn about Python Libraries for datascience, along with a thorough intuitive grinding for various Machine learning Algorithms.Course description from Website:

Learning from data in order to gain useful predictions and insights. This courseintroduces methods for five key facets of an investigation: data wrangling, cleaning, andsampling to get a suitable data set; data management to be able to access big dataquickly and reliably; exploratory data analysis to generate hypotheses and intuition;prediction based on statistical methods such as regression and classification; andcommunication of results through visualization, stories, and interpretable summaries.

(A1) CS229: Andrew Ng:

Contains the maths behind many of the Machine Learning algorithms. The Game Changer

pal
Highlight
pal
Highlight
pal
Highlight
pal
Pencil
Page 6: (9) What Classes Should I Take if I Want to Become a Data Scientist_ - Quora

12/24/2015 (9) What classes should I take if I want to become a data scientist? ­ Quora

https://www.quora.com/What­classes­should­I­take­if­I­want­to­become­a­data­scientist 6/9

Upvote Downvote Comments 5+ Share 4

machine learning course. I will put this course as numero uno as this course motivated meinto getting in this field and Andrew Ng is a great instructor.

DISTRIBUTED AND PARALLEL COMPUTING:

(A1) Intro to Hadoop & Mapreduce ­ Udacity

Very Easy Course. Taught the Fundamentals of Hadoop streaming with Python taken byCloudera on Udacity. I am doing much more advanced stuff with python and Mapreducenow but this is one of the courses that laid the foundation there.

(A2) BerkeleyX: Introduction to Big Data with Apache Spark and (A3)BerkeleyX: CS190.1x Scalable Machine Learning

A mighty flame followeth a tiny spark.

This is a series of courses in Spark taught by Anthony D. Joseph ,a Professor in ElectricalEngineering and Computer Science at UC Berkeley and Ameet Talwalkar , a well knownname in Spark community.

This course delivers on what it says. It teaches Spark. Total beginners will have difficultyfollowing the course as the course progresses very fast. That said anyone with a decentunderstanding of how big data works will be OK.

The top ideas covered in this course are:

1. RDD Transformations (map, flatmap, filter, distinct, groupByKey, sortByKey,reduceByKey)

2. RDD Actions (reduce, takeOrdered, take, collect)3. Accumulator and BroadCast Variables4. Dataframe in pySpark5. SQL on paired RDDs ­ leftOuterJoin, rightOuterJoin, fullOuterJoin

I certainly liked the Mini Projects in the class:

1. Wordcount in Spark ­ A word counting program to count the words in all ofShakespeare’s plays

2. Apache Log File analysis in Spark ­ Use Spark to explore NASA Apache webserver log

3. Entity Resolution ­ Entity Resolution using TFIDF approaches in Spark.4. Movie Recommendation using ALS ­ Predicting Movie ratings using Spark.5. Linear Regression ­ Predicting Song Year using Linear regression in Spark.6. Logistic Regression ­ Predicting Click Through Rates using Spark. One Hot

Encoding, Hashing Explained.7. PCA ­ Running PCA on neuroscience data

Some of the courses here may seem repetitive but they all have provided some sort ofadditional skills therefore I have put them here.

I will update this answer for more details as I complete the TODO courses on the list.

Hope that Helps :)

Written Dec 17 • View Upvotes

74

I want to echo something Joseph Adler mentioned at the end of his answer: the thing thateven academically well­equipped students will have not been exposed to is the toolboxrequired to triage and process a hunk of raw data they acquire from some source. Sprinkling in real­world datasets and data cleaning experience is key to a curriculum indata science.

Eugene Wu and I recently taught a 6­day (3 hours per day) course on data literacy basicstargeted at computer science undergraduates[1]. Our initial motivation was selfish: asdatabases researchers, we didn't have a lot of experience with an end­to­end raw data­>data product pipeline. After a few trial runs of our own, we realized certain data

Adam Marcus, taught a 6­day data literacy course5.8k Views • Upvoted by William Chen, Data Scientist at Quora

Page 7: (9) What Classes Should I Take if I Want to Become a Data Scientist_ - Quora

12/24/2015 (9) What classes should I take if I want to become a data scientist? ­ Quora

https://www.quora.com/What­classes­should­I­take­if­I­want­to­become­a­data­scientist 7/9

Upvote Downvote Comment Share

processing patterns kept showing up, and saw that we had a small course worth of contenton our hands. The important thing here is that even with undergraduate­ and graduate­level machine learning, statistics, and database courses under our belts, we still had a lot tolearn about working with honest­to­goodness dirty data.

Each module of our course could have had an entire semester dedicated to it, and so wefavored basic skills with lots of hands­on experience over intellectual depth and rigor. Wekept lectures to 20­30 minutes, giving students the remaining 2.5 hours to go through thelabs we set up while we walked around answering questions. Lectures allowed students toknow what they were in for at a high level, and the lab portion allowed them to cementthose concepts with real datasets, code, and diagrams. All of the course content isavailable at [1], and here is a direct link to day 1's lab [2].

The syllabus we covered was:Day 1: an end­to­end experience in downloading campaign contribution datafrom the federal election commission, cleaning it up, and programmaticallydisplaying it using basic charts.

Day 2: visualization/charting skills using election and county health data.Day 3: statistics to take the hunches they got on day 2 and quantify them,learning about T­Tests and linear regression along the way.Day 4: text processing/summarization using the Enron email corpus.Day 5: MapReduce to scale up Day 4's analysis using Elastic MapReduce onAmazon Web Services. This felt a bit forced, but the students were clamoring fordistributed data processing experience.Day 6: the students teach us something they learned on their own datasets usingtechniques we've taught them.

While we set out to give computer science students with familiarity in pythonprogramming a dive into data, we ended up with folks from the physical sciences, doctors,and a few social scientists who had their own datasets to answer questions about. The lastday allowed them to experiment with their new skills on their own data. Attendance onthis day was lower than the previous days: the majority of the folks in attendance on day 6were on the more experienced end, and I suspect that the undergrads, who were not yetexposed to data problems of their own, didn't find it as engaging. It would be interestingto see how to develop course content that allows self­directed data science for studentswho still need a bit more inspiration.

I should also say that our attempt is not the first one to bring data to the classroom. JeffHammerbacher and Mike Franklin at Berkeley have a wonderful semester­length courseon data science [3]. The high­level outline of the course seems similar, but they get fartherinto data product design, and jump into each topic in more depth. Their resources page[4] has a nice set of links to other educational efforts worth checking out.

[1] http://dataiap.github.com/dataiap/[2] http://dataiap.github.com/dataia...[3] http://datascienc.es/[4] http://datascienc.es/resources/

Written Apr 11, 2012 • View Upvotes

35

Today, we use the term "data science" to mean "doing stuff with data." Some datascientists build products, some optimize businesses, others try to understand businesses.Regardless of what a data scientist does, there are three things that a data scientist needsto understand to be effective:

(1) Math(2) Computer Science(3) The problem that he or she is solving

Joseph Adler, Data Scientist at LinkedIn, O'Reilly Author7.1k Views • Upvoted by Robert Chang, Data Janitor @ Twitter | Taiwanese American |Statistically educated | Aspiring singer • James Pitt • 1 other you follow

Page 8: (9) What Classes Should I Take if I Want to Become a Data Scientist_ - Quora

12/24/2015 (9) What classes should I take if I want to become a data scientist? ­ Quora

https://www.quora.com/What­classes­should­I­take­if­I­want­to­become­a­data­scientist 8/9

Upvote Downvote Comments 4+ Share 1

Let me explain a little more about each one.

(1) Math. Whether you have a lot of data or a little bit, you're going to have to use somemath to make sense of it. Math helps you find patterns in data and determine if thosepatterns are meaningful. In practice, this means a data scientist needs to know somestatistics and machine learning. It's helpful to know some algebra, signal processing, andtopology as well. (Seriously.)

(2) Computer Science. Today, almost all the data that you encounter will be generated byand stored on computers. Often, you'll have to shrink that data, clean it up, or combine itwith other data. Sometimes, you'll have so much data that you can't solve your problemquickly. In order to work with data, you'll have to know how to program a computer. Butin order to cope with large amounts of data, you'll need to know about computerarchitecture and algorithms. You may even have to work with data that's stored in a cloudor processed on a distributed system. I'd recommend that any data scientist learn thebasics of software engineering, algorithms, and computer architecture.

(3) The problem that he or she is solving. If you understand the problem you are trying tosolve, and the data that you are trying to use, you will be able to distinguish answers thatmake sense from answers that do not, think of novel data sources to look for, and think ofnew ways to solve problems. Don't underestimate the importance of understandingeconomics, physics, biology, or human psychology when you're tackling a problem. Inpractice, I'd recommend that a data scientist should have some training in economics(specifically econometrics and game theory), but any scientific training is helpful.

And finally, I wouldn't underestimate the value of experience. There's a lot of stuff that I'velearned the hard way about cleaning data, running experiments, and implementingsolutions. Academic training is a great start, but the real world is complicated and changesquickly. Any good training program needs to include some big, hands on projects with realworld data (not clean toy data sets).

Written Apr 11, 2012 • View Upvotes

68

Lots of great answers here on the technical stuff ­ great, but too many graduate datascientists (and variants on that theme ­ statisticians, data programmers, data analysts etc.)are unable to communicate their findings effectively. This is a recurring theme in myexperience (see Skills for Big Data? ), so I would suggest that ­ in addition to solidmaths & computer science skills ­ that you should add:

Data visualisation (even light graphic design principles)Rhetoric (yes, I'm serious!)Cognitive psychology (still serious)

Those may sound odd suggestions, but an element of all three makes a huge difference; aneffective data scientist should be able to explain findings in a way that the audienceunderstands.

For the 1% who only need to communicate with engineers, you're fine with your statisticsand maths proofs... for the rest of us, the audience will consist of businesspeople who wantto understand enough of your findings, with confidence in your method, to take someform of corrective action.

In order to communicate effectively with this kind of audience you need to be a storyteller,able to explain:

What data you used ­ in their terminology (requiring you to have some domainexpertise)How you explored that data and discovered interesting patterns (visualisationhelps massively here)Why you believe that your findings are important (rhetorical skill helps you shapeand persuade your audience, focusing on their needs not yours)

Guy Cuthbert, Data Animator, https://uk.linkedin.com/in/guycuthbert2k Views • Upvoted by Ankit Sharma, Data Scientist at DataRPM

Page 9: (9) What Classes Should I Take if I Want to Become a Data Scientist_ - Quora

12/24/2015 (9) What classes should I take if I want to become a data scientist? ­ Quora

https://www.quora.com/What­classes­should­I­take­if­I­want­to­become­a­data­scientist 9/9

Want to help others learn more?

Upvote Downvote Comment 1 Share

Above all, you need to ensure that your audience learns from your story and acts upon it ­so a little cognitive psychology will help you explain to the audience their natural biases,how to detect and avoid false patterns, and will certainly help you shape visualisationswhich convey the message you intend to deliver.

Written Mar 9, 2013 • View Upvotes

17

Ask a Question

Top Stories from Your Feed

Swati Tiwari Xu Beixi and 5 more upvotedthis • Dec 18

How do we overcome the regretfeeling of wasted years?

My Mom gave me two packets of biscuitsprior to the journey. I ate one biscuit fromone pack and let it remain open for the restof the journey. Towards the end, I had onewasted stale pack and one...

Aarushi Ruddra, Doctor in process135.2k Views • Upvoted by Rupal Verma •Shubha Hazra • 5 others you follow

Dashdikpal Nandeshwar and Kaore OmkarDeepak upvoted this • Dec 18

How do I tell my best friend I'm inlove with her?

I'd like to tell you about two stories, in brief.My bestfriend fell in love with me 6 monthsago. However I did not feel the same, but wecontinued being bestfriends until things gotmessed up an...

Shreyasi Biswas, Student50.3k Views • Upvoted by Vinita Punjabi,C.A.Aspirant • Kaore Omkar Deepak •Dashdikpal Nandeshwar

Sandhya Ramesh Bala Senthil Kumar and1 more upvoted this • 3am

Who is the oldest known person inthe history of mankind with a validproof of their age?

Carlos Matias La Borde, Softwaredeveloper, artist, occassionalentrepreneur89k Views • Upvoted by Sandhya Ramesh• Gwen Sawchuk • 1 other you follow

Read In Feed Read In Feed Read In Feed