15
How I did my Master Thesis Amendra Shrestha Uppsala University January 10, 2017

How to solve a problem with machine learning

Embed Size (px)

Citation preview

Page 1: How to solve a problem with machine learning

How I did my Master Thesis

Amendra Shrestha

Uppsala University

January 10, 2017

Page 2: How to solve a problem with machine learning

Introduction Workflow Data Preparation Modeling Process Experiment

Introduction

Find a topic

- 1 -

Page 3: How to solve a problem with machine learning

Introduction Workflow Data Preparation Modeling Process Experiment

Introduction

Reading

• papers

• online machine learning courses (coursera.org)

• mastery for self study (http://machinelearningmastery.com)

• writing

- 2 -

Page 4: How to solve a problem with machine learning

Introduction Workflow Data Preparation Modeling Process Experiment

Introduction

Data

• crawl data from social media, discussion forums, blogs

• download from archives• KONECT (http://konect.uni-koblenz.de)• UCI (http://archive.ics.uci.edu/ml/datasets.html)• Kaggle (http://blog.kaggle.com)• Spr̊akbanken (https://spraakbanken.gu.se/)

- 3 -

Page 5: How to solve a problem with machine learning

Introduction Workflow Data Preparation Modeling Process Experiment

Project workflow

• Data Preparation• data cleaning• data preparation• feature vector creation

• Modeling Process• feature selection• transformation• missing data• model generation• model selection

- 4 -

Page 6: How to solve a problem with machine learning

Introduction Workflow Data Preparation Modeling Process Experiment

Data Preparation

Cleaning data

• removing duplicates

• impossible values• negative ages

• misspelt words

• inconsistent time formats

• unwanted elements• text: quotes, retweets, strange symbols, URLs, punctuations,

function words

• outliers

- 5 -

Page 7: How to solve a problem with machine learning

Introduction Workflow Data Preparation Modeling Process Experiment

Data Preparation

Preparation of data

• alterations of data

• stemming and lemmization of text

• uniformization of units

- 6 -

Page 8: How to solve a problem with machine learning

Introduction Workflow Data Preparation Modeling Process Experiment

Data Preparation

Feature vector

• n-dimensional vector of features

• types• data dependent• data independent

• text• bag of words• term frequency• tf-idf• n-grams

- 7 -

Page 9: How to solve a problem with machine learning

Introduction Workflow Data Preparation Modeling Process Experiment

Modeling Process

Feature selection

use a minimal number of maximally informative features

• noise

• overfitting

• computational load

best features?

• background/expert knowledge

• pairwise statistical analysis

• model validation

- 8 -

Page 10: How to solve a problem with machine learning

Introduction Workflow Data Preparation Modeling Process Experiment

Modeling Process

Transformation

• scaling

• PCA

- 9 -

Page 11: How to solve a problem with machine learning

Introduction Workflow Data Preparation Modeling Process Experiment

Modeling Process

Dealing with incomplete data

• use a model that can deal with missing items

• throw away

• simple statistic (not recommended)• mean, median, mode

• Knn imputation

- 10 -

Page 12: How to solve a problem with machine learning

Introduction Workflow Data Preparation Modeling Process Experiment

Modeling Process

Model generation

• supervised Learning• artificial neural network• decision tree learning• support vector machines• random forests

• unsupervised Learning• clustering

- 11 -

Page 13: How to solve a problem with machine learning

Introduction Workflow Data Preparation Modeling Process Experiment

Modeling Process

Model selection

• how well the model performs on new data

• splitting training, validation and test set

• cross validation• divide data into n subsets• generate n models, each using all but one subset• test each model on the hold-out subset• combine results

- 12 -

Page 14: How to solve a problem with machine learning

Introduction Workflow Data Preparation Modeling Process Experiment

Experiment

- 13 -

Page 15: How to solve a problem with machine learning

Introduction Workflow Data Preparation Modeling Process Experiment

Experiment

- 14 -

Thank You