Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine...

Preview:

Citation preview

Department of Computer Science, University of Waikato, New Zealand

Eibe Frank

WEKA: A Machine Learning Toolkit

The Explorer• Classification and

Regression• Clustering• Association Rules• Attribute Selection• Data Visualization

The Experimenter The Knowledge

Flow GUI Conclusions

Machine Learning with WEKA

based on notes by

04/10/23 University of Waikato 2

WEKA: the bird

Copyright: Martin Kramer (mkramer@wxs.nl)

04/10/23 University of Waikato 3

WEKA: the software Machine learning/data mining software written in

Java (distributed under the GNU Public License) Used for research, education, and applications Complements “Data Mining” by Witten & Frank Main features:

Comprehensive set of data pre-processing tools, learning algorithms and evaluation methods

Graphical user interfaces (incl. data visualization) Environment for comparing learning algorithms

04/10/23 University of Waikato 4

WEKA: versions There are several versions of WEKA:

WEKA 3.0: “book version” compatible with description in data mining book 1st edition

WEKA 3.2: “GUI version” adds graphical user interfaces (earlier version is command-line only)

WEKA 3.4 ++ on SoC linux and ISS windows This talk is based on snapshots of WEKA 3.3 … with some extra up-to-date snapshots Only changes are “layout” and some extras

04/10/23 University of Waikato 5

@relation heart-disease-simplified

@attribute age numeric@attribute sex { female, male}@attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina}@attribute cholesterol numeric@attribute exercise_induced_angina { no, yes}@attribute class { present, not_present}

@data63,male,typ_angina,233,no,not_present67,male,asympt,286,yes,present67,male,asympt,229,yes,present38,female,non_anginal,?,no,not_present...

WEKA only deals with “flat” files

04/10/23 University of Waikato 6

@relation heart-disease-simplified

@attribute age numeric@attribute sex { female, male}@attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina}@attribute cholesterol numeric@attribute exercise_induced_angina { no, yes}@attribute class { present, not_present}

@data63,male,typ_angina,233,no,not_present67,male,asympt,286,yes,present67,male,asympt,229,yes,present38,female,non_anginal,?,no,not_present...

WEKA only deals with “flat” files

04/10/23 University of Waikato 7

04/10/23 University of Waikato 8

04/10/23 University of Waikato 9

04/10/23 University of Waikato 10

Explorer: pre-processing the data Data can be imported from a file in various

formats: ARFF, CSV, C4.5, binary Data can also be read from a URL or from an SQL

database (using JDBC) Pre-processing tools in WEKA are called “filters” BUT it may be easier to reformat to ARFF yourself

(write a program in python / java … or just use WordPad to type in the text – but make sure format is right!), this helps with data understanding

04/10/23 University of Waikato 11

04/10/23 University of Waikato 12

04/10/23 University of Waikato 13

04/10/23 University of Waikato 14

04/10/23 University of Waikato 15

04/10/23 University of Waikato 16

04/10/23 University of Waikato 17

04/10/23 University of Waikato 18

Explorer: building “classifiers” Classifiers in WEKA are models for predicting

nominal or numeric quantities Implemented learning schemes include:

Decision trees and lists, instance-based classifiers, support vector machines, multi-layer perceptrons, logistic regression, Bayes’ nets, …

You explore by trying different classifiers, see which works best for you…

04/10/23 University of Waikato 19

04/10/23 University of Waikato 20

04/10/23 University of Waikato 21

04/10/23 University of Waikato 22

04/10/23 University of Waikato 23

04/10/23 University of Waikato 24

04/10/23 University of Waikato 25

04/10/23 University of Waikato 26

04/10/23 University of Waikato 27

04/10/23 University of Waikato 28

04/10/23 University of Waikato 29

04/10/23 University of Waikato 30

04/10/23 University of Waikato 31

04/10/23 University of Waikato 32

04/10/23 University of Waikato 33

04/10/23 University of Waikato 34

04/10/23 University of Waikato 35

04/10/23 University of Waikato 36

04/10/23 University of Waikato 37

04/10/23 University of Waikato 38

04/10/23 University of Waikato 39

04/10/23 University of Waikato 40

04/10/23 University of Waikato 41

04/10/23 University of Waikato 42

04/10/23 University of Waikato 43

04/10/23 University of Waikato 44QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture.

04/10/23 University of Waikato 45QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture.

04/10/23 University of Waikato 46QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture.

04/10/23 University of Waikato 47

WEKA from ISS PC

2009

@relation ukus

@attribute center numeric@attribute centre numeric@attribute centerpercent numeric@attribute color numeric@attribute colour numeric@attribute colorpercent numeric@attribute english {UK,US}

@data1,32,3, 0,20,0, UK0,25,0, 0,12,0, UK9,27,25, 0,84,0, UK0,19,0, 0,24,0, UK0,16,0, 0,14,0, UK0,16,0, 0,12,0, UK0,21,0, 0,38,0, UK0,25,0, 0,34,0, UK2,26,7, 2,3,40, UK2,32,5, 1,59,2, UK31,0,100, 55,0,100, US61,0,100, 26,0,100, US24,0,100, 11,0,100, US12,1,92, 21,4,84, US8,0,100, 4,2,67, US10,0,100, 8,0,100, US19,0,100, 22,0,100, US14,0,100, 7,0,100, US14,0,100, 6,0,100, US8,5,62, 24,0,100, US

@relation test

@attribute center numeric@attribute centre numeric@attribute centerpercent numeric@attribute color numeric@attribute colour numeric@attribute colorpercent numeric@attribute english {UK,US}

@data10,5,33, 0,20,0, UK

04/10/23 University of Waikato 70

WEKA has more… Clustering data into groups Finding associations between attributes Visualisation - online analytical processing Experimenter to run and compare different MLs Knowledge Flow GUI 3rd-party add-ons: sourceforge.net http://www.cs.waikato.ac.nz/ml/weka

Recommended