15
Data Scientist Enablement DSE 400 - Fast Track to Data Science Week 4 Roadmap Advanced Center of Excellence Modern Renaissance Corporation In Collaboration with SONO team and others Content of this document is under Creative Commons Licence CC BY 4.0

Data scientist enablement dse 400 week 4 roadmap

Embed Size (px)

Citation preview

Page 1: Data scientist enablement   dse 400   week 4 roadmap

Data Scientist EnablementDSE 400 - Fast Track to Data Science

Week 4 Roadmap

Advanced Center of ExcellenceModern Renaissance CorporationIn Collaboration with SONO team and others

Content of this document is under Creative Commons Licence CC BY 4.0

Page 2: Data scientist enablement   dse 400   week 4 roadmap

AgendaYou can always find the latest version of this document at http://bit.ly/1g8tMKM

Week 4 OverviewDiscussions Learning PathActivities AssignmentSubmissionLooking aheadReferencesCitation

Page 3: Data scientist enablement   dse 400   week 4 roadmap

Discussions:Big Data - top blog posts from 2013. Evolving Darwin Genetic Algorithm. Optional Q&A.

Learning plan:Read R for Machine Learning by Allison Chang and Introduction to Machine Learning etc.

Activities:

Try Visualization through spreadsheets. Implement functions in R. Build a personal roadmap.

Assignment 4:Survey Paper - How Big Data is being used in your industry.

DSE 400 - Week 4 at a glance

Page 4: Data scientist enablement   dse 400   week 4 roadmap

Discussion 1: Read Top 8 Big Data Posts from December 2013. Pick a post that interest you most. Comment what you like most about it and how these insights can applied.

Discussion 2: Watch video Evolving Darwin - Genetic Algorithm and comment on it. Does it sound like a valid machine learning approach? What are its strengths and weaknesses, if any? How would you improve it?

These discussions are required. If you already have access to SONO > DSE 400, you will be required to participate in these discussions. There will also be an Optional Q&A.

Please do not create additional threads in weekly KCs.

Social Engagement on SONO - Week 4http://getsokno.com/redvinef/controllers/cell.php?user_knocell=1004

Page 6: Data scientist enablement   dse 400   week 4 roadmap

<Practice> Write a user-defined function in R that takes an integer N and outputs the sum of first N odd numbers. Using this function verify that the sum of first N odd integers is given by the formula N^2 (i.e N*N or N-squared).

Activities<Practice> Gather the data on 2010 Winter Olympic Medals. Visualize this data using a spreadsheet showing geographic distribution pattern of these medals. If you use Google Spreadsheet this pattern may like the adjacent picture. Later on you can repeat this exercise for 2014 Winter Olympics

Page 7: Data scientist enablement   dse 400   week 4 roadmap

<Practice> Sieve of Eratosthenes is an algorithm that describes how to generate all prime numbers between 1 and given number N, by eliminating the multiples of prime numbers. Write an R function that implements Sieve of Eratosthenes.

<Practice> Build a personal Career Advancement Roadmap. Focus on your career over 5-10 year horizon. Get an inventory of your current strengths and capabilities. Reflect on your career ambitions and add it to this roadmap. Use DSE Roadmap to enhance your capabilities to move you towards the desired goals. What other skills and competencies do need to advance yourself? Use open knowledge repositories like ocw.mit.edu to examine these additional capabilities you can assimilate.

Activities

Page 8: Data scientist enablement   dse 400   week 4 roadmap

Assignment 4 - Submission Required

Prepare a small survey (i.e. overview) paper (2-5 pages) of Big Data and its impact on your industry or area of focus. If you do not have a preferred industry or area of focus, choose either Retail or Telecom sector. Use pictures and infographics in your paper to make it readable. As an example, you may refer to The 'big data' revolution in healthcare - McKinsey & Company report. Your assignment doesn’t have to be this exhaustive. It is enough if you give an overview and make it readable for any audience. You can use blogs, newspaper articles, webinars and Linkedin forums etc. to gather material for your survey.

If you do not have access to commercial Word Processing Packages, you can use either Google Docs or OpenOffice.org or similar free or opensource package.

Page 9: Data scientist enablement   dse 400   week 4 roadmap

Submissions

Deadline Saturday, 11:59 PM your local time.

Mail Assignment 4 to <[email protected]> Submit a single PDF document showing your Big Data Survey. Use this naming convention: DSE 400 - Assignment 4 - Your Full Name for your document. No document links should be sent. Just one single PDF document, please. Please add DSE 400 > Assignment 4 in the subject line.

Page 10: Data scientist enablement   dse 400   week 4 roadmap

Week 5 Visualizations. Submit your research Data Visualization Tools - A Comparative Study

Week 6 -7 Processing large data sets. Hadoop Ecosystem. Stream Computing etc.

Week 8 Ethics, Privacy and Building Data Products.

DSE 400 - Weeks 5-8 ahead

Page 11: Data scientist enablement   dse 400   week 4 roadmap

References, Resources and Additional Reading

[MIT OCW] R for Machine Learning by Allison Chang An Introduction to Machine Learning. Hilary Mason, O’Reilly Media Inc., 2011Machine Learning, Tom Mitchell, Mc Graw-Hill Publishers, 1997Advanced Machine Learning. Hilary Mason, O’Reilly Media Inc., 2012Scaling Up Machine Learning. Bekkerman, Bilenko, and Langford, O’Reilly Publishers, 2011[MIT OCW] Prediction: Machine Learning and Statistics Stanford University Machine Learning Video CollectionCaltech Machine Learning Video Collection

Page 13: Data scientist enablement   dse 400   week 4 roadmap

For More Information

Week 4 discussions take place during this week on SONO DSE 400 Week 4

<Help On Demand> You may reach out to Ms. Rachel Fleming <[email protected]> if you have any difficulties with the assignments or looking for more activities.

If you have any questions or suggestions on SONO, please reach out Mr. Eric Kmeic <>

We welcome questions, thoughts and suggestions. Post these on SONO in the right forum/discussion or write to us at <[email protected]>

You can always find the latest version of this document at http://bit.ly/1g8tMKM

Page 14: Data scientist enablement   dse 400   week 4 roadmap

Fun@Work

Page 15: Data scientist enablement   dse 400   week 4 roadmap

In year 1859, Charles Darwin published On the Origin of Species which is regarded as one of the monumental works in human history. In this work, he explained that life on earth adapts to constantly changing environment by means of natural selection.

Thank You