Upload
haritha-thilakarathne
View
25
Download
0
Embed Size (px)
Citation preview
Haritha ThilakarathneSoftware Engineer – Data Science & AnalyticsTech One Global – Enadoc Dev Center http://haritha.me
Data science is a multidisciplinary blend of data inference, algorithm development, & Technology in order to solve analytically complex problems.
• Making decisions• Confirming hypotheses• Gaining insights• Predicting future
Big Data Manipulation & Analysis
Data Mining
Data Visualization
Detail on distribution of artworks in the Tate collection by birthdate of artists, visualized by Florian Krautli.
Data Collection & Preparation • Extracting data from difficult sources• Filling in missing values•Removing suspicious data•Making formats, encoding, and units consistent•De-duplicating and matching
Correlation and Causation•Correlation – Values track each other• Height and Shoe Size • Grades and Entrance Exam Scores
•Causation – One value directly influences another • Education Level ->Starting Salary • Temperature -> Cold Drink Sales
Overfitting & Underfitting
Languages, Systems, Platforms• Spreadsheets• Programming Languages (R/Python)• Relational Database Management Systems • NoSQL Systems (Cassandra/ DocumentDB/ MongoDB)• Specialized Languages on scalable systems ( MapReduce/
Hadoop)• Systems for data visualization (PowerBI/ Tableau)• Data Processing on Cloud (Azure, Amazon Web Services)
Regression
Regression Goal: Function f applied to training data should
produce values as close as possible in aggregate to actual outputs
Classification
Clustering
Neural Networks