A Brief Introduction and Issues on the Classification Problem Jin Mao Postdoc, School of...

A Brief Introduction and Issues on the Classification Problem

Jin MaoPostdoc, School of Information, University of Arizona

Sept 18, 2015

Outline

Classification Examples

Spam Email filtering

Fraud detection

Self-piloting automobile

The Classification Problem

Classic Classifiers

Naïve Bayes

Decision Tree ： J48(C4.5)

RandomForest

SVM ： SMO, LibSVM

Neural Network

How to Choose the Classifier?

Observe your data: amount, features

Your application: precision/recall, explainable, incremental,

complexity

Decision Tree is easy to understand, but can't predict numerical values and is

Naïve Bayes is robust for somehow, easy to increment.

Neural networks and SVM are "black boxes“. SVM is fast to predict yes or no.

!Never Mind: You can try all of them.

Model Selection with Cross Validation

How to Choose the Classifier?

Discussions: http://stackoverflow.com/questions/2595176/when-to-choose-which-m

achine-learning-classifier http://stats.stackexchange.com/questions/7610/top-five-classifiers-to-t

ry-first http://nlp.stanford.edu/IR-book/html/htmledition/choosing-what-kind-of

-classifier-to-use-1.html http://www.researchgate.net/post/How_to_decide_the_best_classifier_b

ased_on_the_data-set_provided

Train Your Classifier

Obtain Training Set

Instances should be labeled.From running systems in practiceAnnotate by multi-experts (Inter-rater agreement)Crowdsourcing (Google’s Captcha)…

Obtain Training Set

Large EnoughMore data can reduce the noisesThe benefit of enough data even can dominate

that of the classification algorithmsRedundant data will do little help.Selection Strategies: nearest neighbors, ordered removals,

random sampling, particle swarms or evolutionary methods

Obtain Training Set

Unbalanced Training Instances for Different ClassesEvaluation: For simple measures, precision/recall,only the

instances of the marjority class (class with many samples), this measure still gives you a high rate. (AUC is better.)

No enough information for the features to find the class boundaries.

Obtain Training Set

Strategies:divide into L distinct clusters, train L predictors, and

average them as the final one. Generate synthetic data for rare class. SMOTEReduce the imbalance level. Cut down the majority class…

Obtain Training Set

More materials https://www.quora.com/In-classification-how-do-you-handle-an-

unbalanced-training-set http://stats.stackexchange.com/questions/57259/highly-

unbalanced-test-data-set-and-balanced-training-data-in-classification

He, Haibo, and Edwardo Garcia. "Learning from imbalanced data." Knowledge and Data Engineering, IEEE Transactions on 21, no. 9 (2009): 1263-1284.

Feature Selection

WhyUnrelated Features noise, heavy computationInterdependent Features redundant featuresBetter Model

http://machinelearningmastery.com/an-introduction-to-feature-selection/Guyon and Elisseeff in “An Introduction to Variable and Feature Selection”

Feature Selection

Feature Selection MethodFilter methods: apply a statistical measure to assign a scoring

to each feature. E.g., the Chi squared test, information gain and correlation coefficient scores.

Wrapper methods: consider the selection of a set of features as a search problem.

Embedded methods: learn which features best contribute to the accuracy of the model while the model is being created. LASSO, Elastic Net and Ridge Regression.

Evaluation Method

Basic Evaluation Method Precision

Confusion matrix

Per-class accuracy

AUC(Area Under the Curve) The ROC curve shows the sensitivity of the classifier by plotting the rate of true positives

to the rate of false positives

Evaluation Method

Cross Validation Random Subsampling

K-fold Cross Validation

Leave-one-out Cross Validation

Cross Validation

Random Subsampling

Cross Validation

K-fold Cross Validation

Cross Validation

Leave-one-out Cross Validation

Cross Validation

Three-way data splits

Apply the Classifier

Save the Model

Make the Model dynamic

Thank you!

A Brief Introduction and Issues on the Classification Problem Jin Mao Postdoc, School of...

Documents

Early warning of pirate attacks based on decision treejournal.it.cas.cz/62(2017)-2B/Paper 01 Sun Mao-jin.pdf4 SUN MAO-JIN, LV JING, GAO TIAN-HANG, SUN XIAO-SHAN situation is ascribed

Ontology based Information Extraction Jin Mao Postdoc, School of Information, University of Arizona Oct. 9th, 2015

Postdoc a.doxa 2016

Postdoc Newsletter - University of California, Berkeley · The UC Postdoc Newsletter invites contributions from any UC postdoc, as well as administrators, faculty, or other UC personnel

Welcome to the MIT Postdoc Orientation!€¦ · Welcome to the MIT Postdoc Orientation! • Presented by the MIT Postdoctoral Association (PDA) ... • Postdoc Zumba - Maria Patsyuk(mpatsyuk@mit.edu)

Postdoc Press, Issue 10

Mao, Maoism and Mao-ology.pdf

Jin-Hua Hu, Jing Mao, Qiang Tu, and Di Wukoreascience.or.kr/article/JAKO202024437943633.pdfJin-Hua Hu, Jing Mao, Qiang Tu, and Di Wu Abstract. We consider closed, star-shaped, admissible

Postdoctoral Appointment Profile System User Manual · 3 . 1 POSTDOC SYSTEM OVERVIEW . The Postdoc profile appears in the Postdoc Appointment Profile System (PAPS) after the postdoc

Jin Mao Tower - Ehsan Markazi - ODTÜ Web Servisiusers.metu.edu.tr/archstr/BS536/documents/Projects/Jin Mao Tower... · Case Study: Jin Mao Tower by Ehsan Markazi ... Smith: outrigger

Jin Mao Tower - Derya Dincelusers.metu.edu.tr/archstr/BS536/documents/Projects/Jin... · 2016. 2. 17. · JIN MAO TOWER Case Study: Jin Mao Tower by Derya Dinçel Submitted to: Günel,

Habitat-Lite & EnvO Jin Mao Postdoc, School of Information, University of Arizona Nov. 20, 2015

Jin Mao Tower

Division’of’Cancer’Control’and’ Populaon’Sciences:’PostDoc ... · Division’of’Cancer’Control’and’ Populaon’Sciences:’PostDoc’ Training’Opportuni4es’

Jin Mao tower china

Wolkenkratzer Ein Referat von Fabian Immler Tower Jin Mao Tower Central PlazaSears Tower

AZIONI DI ORIENTAMENTO POSTDOC. NECESSITA' DI AVVIARE INIZIATIVE DI ORIENTAMENTO POSTDOC CON: un Servizio specifico di Orientamento postdoc per dottorandi

Postdoc Orientation Po

Microsoft Word - Instructions to Authors-CRC.doc€¦ · Web viewSudarshan SRINIVASAN, P.A.Muhammed BASHEER, Jianghong MAO, Wei-Liang JIN, W.John mcCarter*** and Kang Li+

THE JIN MAO TOWER - Texas A&M Universityfaculty.arch.tamu.edu/.../projects-631/Files/JinmaoTower.pdfFoundation Around the center part of the skyscraper, structural engineers design