24
Data Scientist Enablement DSE 400 - Fast Track to Data Science Week 1 Roadmap Advanced Center of Excellence Modern Renaissance Corporation In Collaboration with SONO team and others Content of this document is under Creative Commons Licence CC-BY-4.0

Data scientist enablement dse 400 - week 1 roadmap

  • View
    410

  • Download
    3

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Data scientist enablement   dse 400 - week 1 roadmap

Data Scientist EnablementDSE 400 - Fast Track to Data Science

Week 1 Roadmap

Advanced Center of ExcellenceModern Renaissance CorporationIn Collaboration with SONO team and others

Content of this document is under Creative Commons Licence CC-BY-4.0

Page 2: Data scientist enablement   dse 400 - week 1 roadmap

AgendaYou can always find the latest version of this document at bit.ly/1hC5wAV

Welcome Mission and ObjectivesDSE Roadmap DSE 400 at a glanceWeek 1 at a glanceDiscussionsLearningPracticeAssignments and SubmissionLooking aheadReferencesAcknowledgement In God we trust. all others must bring data. - W Edwards Deming

Page 3: Data scientist enablement   dse 400 - week 1 roadmap

Welcome

Welcome to DSE 2014 Track. You are on one of he tmost exciting programs to disseminate knowledge, diffuse advancements and also stimulate adoption of Data/Decision Sciences, Big Data Analytics and what we call Evidence-Oriented Systems Engineering. The content and the courses are designed to be easy, engaging and engendering. Consequently, we also hope this program will also be most rewarding for you from intellectual, pragmatic and professional development perspectives.

Page 4: Data scientist enablement   dse 400 - week 1 roadmap

Mission and Objectives

Mission of our program is to provide free, open and world-class enablement of Data Scientists and help advance the profession of Data Science as well as allied disciplines.

We aim to prepare the participants with analytical and practical skills emphasizing breadth and depth in a range of relevant disciplines and capabilities in Data/Decision Sciences, Big Data Analytics, Architecture and Systems Engineering.

Page 5: Data scientist enablement   dse 400 - week 1 roadmap

Data Scientist Enablement Roadmap - 2014

Fast track toData Science

Machine Learning with R

Modern Data Platforms

Advanced Techniques inBig Data Analytics

“”“A Data Scientist is someone who knows how to extract meaning from and interpret data, which requires both tools and methods from statistics and machine learning, as well as being human.”

- Rachel Schutt and Cathy O’Neil, Doing Data Science

Page 6: Data scientist enablement   dse 400 - week 1 roadmap

DSE 2014 with tentative timeline

Fast track toData Science(DSE 400)

Modern Data Platforms (DSE 502)

Advanced Techniques inBig Data Analytics (DSE 600)

Jan 19 - Mar 15

Mar 30 - May 10

May 25 - July 5

July 20 - Aug 30

Machine Learning with R (DSE 501)

Page 7: Data scientist enablement   dse 400 - week 1 roadmap

Introductory course with NO pre-requisites. It employs socialized learning paradigm involving individual effort, team work, discussions and collaboration on SONO (Social Knowledge) platform.Topics include Algorithms, Statistical Inference, Data Analysis, Hadoop, R, Data Engineering, Machine Learning, Visualization, Applications, Case Studies, employing a variety of tools and techniques.

DSE 400 at a glance

Page 8: Data scientist enablement   dse 400 - week 1 roadmap

Discussions(on SONO):Welcome, Introductions, Programming and Analytics background etc.

Reading plan:Read Chapters 1-3 from An Introduction to Data Science by Jeffrey Stanton and Big Data [sorry] & Data Science: What Does a Data Scientist Do?

Activities:Installing R and R-Studio; Fun with Math; Playing with ML Datasets, Research on Data Visualization tools etc.

Assignment 1:Download Housing dataset from UCI Machine Learning Repository to your local machine or cloud drive. Import this dataset into your R environment and display this dataset.

Submission

DSE 400 - Week 1 at a glance

Page 10: Data scientist enablement   dse 400 - week 1 roadmap

Discussion 1: Welcome to DSE program. Discussion 2: What programming languages are you familiar with? What languages do you use on day to day basis? Do you have any experience using R Language? What kind of Analytics tools if any, you have used before? <Optional> Discussion 3: Q&A. We will focus on topics central to Week1. But General questions are also welcome.

To participate in these discussions visit DSE 400 Week 1 at http://getsokno.com/redvinef/controllers/cell.php?user_knocell=1001

Social Engagement on SONO - Week 1

Page 12: Data scientist enablement   dse 400 - week 1 roadmap

<Required> Visit http://www.rstudio.com/ Follow the instructions to download and install R and R-Studio. For specific advice on your system and its configuration, several how-to videos on Installing R and R-Studio can be found on Youtube. Skip this activity if you already have R and R-Studio.

<Collaborative Research> <Required> Create a presentation on Data Visualization Tools - A Comparative Study . Incorporate your unique ideas, research and collective insights to arrive at the right evaluation methodology, explain your thought-process and justify your choices. Note: You will build this presentation for 4 weeks. You and your team will present it during 5th week

Activities

Page 13: Data scientist enablement   dse 400 - week 1 roadmap

<Practice> Math is Fun. Create a bar chart quickly with 10 random values using Data Graphs widget at Math is Fun website. Change graph to Pie Chart. Display percentages only, not the original values.

<Practice> Visit UCI Machine Learning Repository. Familiarize yourself with various datasets at this site. Feel free to download any dataset you like. We will be using this repository in DSE program extensively. For week 1 our focus is on just “Housing” dataset.

Activities - contd

Page 14: Data scientist enablement   dse 400 - week 1 roadmap

Download R-Studio, in case you have not already done so. Download Housing dataset from UCI Machine Learning Repository to your local machine or cloud drive. Import this dataset into your R environment and display this dataset. Show the screenshot of your environment. (See the sample image in the next slide.)

http://archive.ics.uci.edu/ml/datasets.html

Assignment 1 - Submission Required

Page 15: Data scientist enablement   dse 400 - week 1 roadmap

Assignment 1 - Example screenshot

Page 16: Data scientist enablement   dse 400 - week 1 roadmap

Submissions Deadline Saturday Jan 25, 11:59 PM your local time.

Submit <mail to [email protected]> the screenshots of your R workspace (on your machine/laptop/desktop) showing the Housing dataset. You can either paste the image into the body of email or create a document in PDF format and send it as an attachment. No links please.

Page 17: Data scientist enablement   dse 400 - week 1 roadmap

Fun@WorkDSE Participant Distribution Pattern

Page 18: Data scientist enablement   dse 400 - week 1 roadmap

Fun@WorkTagcloud of professional backgrounds of DSE Participants

Page 19: Data scientist enablement   dse 400 - week 1 roadmap

Week 2 Basic Statistics, Hypothesis Testing, Regression, Playing with Spreadsheets,Visualization with R. If you are new to Statistics or need a refresher, read ahead Think Stats: Probability and Statistics for Programmers or watch Statistics Playlist by Khan Academy

Week 3 - 4 Intro to Machine Learning(ML) - Classification, Clustering, Prediction NaiveBayes, Recommendations and Boosting algorithms

Week 5 Visualizations. Present your research Data Visualization Tools - A Comparative Study

Week 6 -7 Processing large data sets. Hadoop Ecosystem. Stream Computing etc.

Week 8 Ethics, Privacy and Building Data Products.

DSE 400 - Weeks 2-8 ahead

Page 20: Data scientist enablement   dse 400 - week 1 roadmap

References and Additional ReadingAn Introduction to Data Science by Jeffrey Stanton. This is a good introduction to Data Science for non-technical readers. This book is available under Creative Commons Licence.Learning R - Video Tutorial Lessons on YoutubeR for Machine Learning by Allison Chung

The Value of Big Data Isn't the Data HBR Article [MIT OCW] Prediction, Machine Learning and Statistics

Page 21: Data scientist enablement   dse 400 - week 1 roadmap

Housing Data Set Information: Concerns housing values in suburbs of Boston.Origin: This dataset was taken from the StatLib library which is maintained at Carnegie Mellon University. Creator : Harrison, D. and Rubinfeld, D.L. 'Hedonic prices and the demand for clean air', J. Environ. Economics & Management, vol.5, 81-102, 1978.

Content that appears as is on this document only, is under Creative Commons BY-NC-SA This license may not apply to material referenced here.

Citation

Page 22: Data scientist enablement   dse 400 - week 1 roadmap

For More Information

DSE 2014 stream is all set set to commence on Jan 19, 2004For more details, visit DSE 400 Announcement Page bit.ly/18zPE1j

Visit DSE 2014 Global to participate in DSE and to get to know the DSE Core Team and participants. Week 1 discussions can found at DSE 400 Week 1

We welcome questions, thoughts and suggestions. Post these on SONO in the right forum/discussion or write to us at <[email protected]>

You can always find the latest version of this document at bit.ly/1hC5wAV

Page 23: Data scientist enablement   dse 400 - week 1 roadmap

We thank our community of committed and passionate volunteers, experts, educators, innovators, benefactors, advisers, advocates, mentors and supporters.We are also grateful to the outstanding support and encouragement from SONO team as well as other organizations like R-Project, Open Courseware Consortium, MIT, IBM, Creative Commons, HortonWorks, Stanford University, Caltech and Data Science Central etc.

Acknowledgement

Page 24: Data scientist enablement   dse 400 - week 1 roadmap

Thank You