29
Exercise in Exercise in Machine Lear Machine Lear http://www.cs.iastate.edu/ ~cs573x/BBSIlab/2006/ Cornelia Caragea

An Exercise in Machine Learning cs573x/BBSIlab/2006/ Cornelia Caragea

Embed Size (px)

Citation preview

Page 1: An Exercise in Machine Learning cs573x/BBSIlab/2006/ Cornelia Caragea

An Exercise in An Exercise in Machine Learning Machine Learning

http://www.cs.iastate.edu/~cs573x/BBSIlab/2006/

Cornelia Caragea

Page 2: An Exercise in Machine Learning cs573x/BBSIlab/2006/ Cornelia Caragea

Outline

• Machine Learning Software

• Preparing Data

• Building Classifiers

• Interpreting Results

Page 3: An Exercise in Machine Learning cs573x/BBSIlab/2006/ Cornelia Caragea

Suites (General Purpose) WEKA (Source: Java) MLC++ (Source: C++) SAS List from KDNuggets (Various)

Specific Classification: C4.5, SVMlight Association Rule Mining Bayesian Net …

Commercial vs. Free

Machine Learning Software

Page 4: An Exercise in Machine Learning cs573x/BBSIlab/2006/ Cornelia Caragea

What does WEKA do? Implementation of the state-of-the-art learning algorithm

Main strengths in the classification Regression, Association Rules and clustering algorithms

Extensible to try new learning schemes

Large variety of handy tools (transforming datasets, filters, visualization etc…)

Page 5: An Exercise in Machine Learning cs573x/BBSIlab/2006/ Cornelia Caragea

WEKA resources API Documentation, Tutorials, Source code.

WEKA mailing list Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations

Weka-related Projects: Weka-Parallel - parallel processing for Weka

RWeka - linking R and Weka YALE - Yet Another Learning Environment Many others…

Page 6: An Exercise in Machine Learning cs573x/BBSIlab/2006/ Cornelia Caragea

Outline

• Machine Learning Software

• Preparing Data

• Building Classifiers

• Interpreting Results

Page 7: An Exercise in Machine Learning cs573x/BBSIlab/2006/ Cornelia Caragea

Preparing Data

ARFF Data Format Header – describing the attribute types

Data – (instances, examples) comma-separated list

Page 8: An Exercise in Machine Learning cs573x/BBSIlab/2006/ Cornelia Caragea

Launching WEKA

java -jar weka.jar

Page 9: An Exercise in Machine Learning cs573x/BBSIlab/2006/ Cornelia Caragea

Load Dataset into WEKA

Page 10: An Exercise in Machine Learning cs573x/BBSIlab/2006/ Cornelia Caragea

Data Filters

Useful support for data preprocessing Removing or adding attributes, resampling the dataset, removing examples, etc.

Creates stratified cross-validation folds of the given dataset, and class distributions are approximately retained within each fold.

Typically split data as 2/3 in training and 1/3 in testing

Page 11: An Exercise in Machine Learning cs573x/BBSIlab/2006/ Cornelia Caragea

Data Filters

Page 12: An Exercise in Machine Learning cs573x/BBSIlab/2006/ Cornelia Caragea

Outline

• Machine Learning Software

• Preparing Data

• Building Classifiers

• Interpreting Results

Page 13: An Exercise in Machine Learning cs573x/BBSIlab/2006/ Cornelia Caragea

Building Classifiers

A classifier model - mapping from dataset attributes to the class (target) attribute. Creation and form differs.

Decision Tree and Naïve Bayes Classifiers

Which one is the best? No Free Lunch!

Page 14: An Exercise in Machine Learning cs573x/BBSIlab/2006/ Cornelia Caragea

Building Classifiers

Page 15: An Exercise in Machine Learning cs573x/BBSIlab/2006/ Cornelia Caragea

(1) weka.classifiers.rules.ZeroR

Class for building and using a 0-R classifier Majority class classifier Predicts the mean (for a numeric class) or the mode (for a nominal class)

Page 16: An Exercise in Machine Learning cs573x/BBSIlab/2006/ Cornelia Caragea

Exercise 1

http://www.cs.iastate.edu/~cs573x/BBSIlab/2006/exercises/ex1.html

Page 17: An Exercise in Machine Learning cs573x/BBSIlab/2006/ Cornelia Caragea

(2)weka.classifiers.bayes.NaiveBayes

Class for building a Naive Bayes classifier

Page 18: An Exercise in Machine Learning cs573x/BBSIlab/2006/ Cornelia Caragea

(3) weka.classifiers.trees.J48 Class for generating a pruned or unpruned C4.5 decision tree

Page 19: An Exercise in Machine Learning cs573x/BBSIlab/2006/ Cornelia Caragea

Test Options

Percentage Split (2/3 Training; 1/3 Testing)

Cross-validation estimating the generalization error based on resampling when limited data; averaged error estimate.

stratified 10-fold leave-one-out (Loo)

Page 20: An Exercise in Machine Learning cs573x/BBSIlab/2006/ Cornelia Caragea

Outline

• Machine Learning Software

• Preparing Data

• Building Classifiers

• Interpreting Results

Page 21: An Exercise in Machine Learning cs573x/BBSIlab/2006/ Cornelia Caragea

Understanding Output

Page 22: An Exercise in Machine Learning cs573x/BBSIlab/2006/ Cornelia Caragea

Decision Tree Output (1)

Page 23: An Exercise in Machine Learning cs573x/BBSIlab/2006/ Cornelia Caragea

Decision Tree Output (2)

Page 24: An Exercise in Machine Learning cs573x/BBSIlab/2006/ Cornelia Caragea

http://www.cs.iastate.edu/~cs573x/BBSIlab/2006/exercises/ex2.html

Exercise 2

Page 25: An Exercise in Machine Learning cs573x/BBSIlab/2006/ Cornelia Caragea

Performance Measures Accuracy & Error rate Confusion matrix – contingency table True Positive rate & False Positive rate (Area under Receiver Operating Characteristic)

Precision,Recall & F-Measure Sensitivity & Specificity For more information on these, see

uisp09-Evaluation.ppt

Page 26: An Exercise in Machine Learning cs573x/BBSIlab/2006/ Cornelia Caragea

Decision Tree Pruning

Overcome Over-fitting Pre-pruning and Post-pruning Reduced error pruning Subtree raising with different confidence

Comparing tree size and accuracy

Page 27: An Exercise in Machine Learning cs573x/BBSIlab/2006/ Cornelia Caragea

Subtree replacement Bottom-up: tree is considered for replacement once all its subtrees have been considered

Page 28: An Exercise in Machine Learning cs573x/BBSIlab/2006/ Cornelia Caragea

Subtree Raising Deletes node and redistributes instances Slower than subtree replacement

Page 29: An Exercise in Machine Learning cs573x/BBSIlab/2006/ Cornelia Caragea

Exercise 3

http://www.cs.iastate.edu/~cs573x/BBSIlab/2006/exercises/ex3.html