153
AS714 Data Mining DATA MINING TOOL WEKA เเเเ ดด.ดดดดดด ดดดดดดดดดดด เเเเเเเเเเเเ 1. ดดดดดดดดดดดดดด ดดดดดดดดดด ดดดด 5020428005 2. ดดดดดดดดด ดดดดดดดดดดด ดดดด 5020428006 3. ดดดดดดดดดดดดดด ดดดดดดดดด ดดดด 5020428012 4. ดดดดดดดดดดดดดดดด ดดดดดดด ดดดด 5020428016 1

AS714 Final Project กลุ่ม 10_ Ver2.0

  • Upload
    meoland

  • View
    2.561

  • Download
    1

Embed Size (px)

DESCRIPTION

WEKA 3.6.1 Manual

Citation preview

AS714 Data Mining

DATA MINING TOOL WEKA . 1. 2. 3. 4. 5020428016 AS 714 1 2552 1 5020428005 5020428006 5020428012

AS714 Data Mining

2

AS714 Data Mining

Knowledge Discovery in Databases (KDD) Data Mining Data Mining Software Software Weka Software Open source Data mining

10 (DTM#2)

19 2552

3

AS714 Data Mining

1 Download WEKA1: address http://www.cs.waikato.ac.nz/ml/weka/ 2: Download

1

4

AS714 Data Mining 3: Windows ()

Stable GUI version Windows version weka-3-6-1jre.exe here

4: web browser

2

downloads Pop up direct link mirror ( 3)

5

AS714 Data Mining

Use this mirror

3

4

6

AS714 Data Mining 5: Downloads

Run Install WEKA save Hard disk

Save save weak-3-6-1jre.exe Install ( Save) Cancel

5

7

AS714 Data Mining

Install

Download

6

7

8

AS714 Data Mining

Download

8

9

AS714 Data Mining

2 WEKA 1: Weka 3.6.1 G:

My Computer G:\

9

10

AS714 Data Mining 2: G:\ weka-3-6-1jre

10

3: Weka 3.6.1 Next

11

I Agree

11

AS714 Data Mining

Next Install

12

13

12

AS714 Data Mining

C:\ Next

14

Install

13

AS714 Data Mining

15

16

14

AS714 Data Mining

17

Cancel

18

15

AS714 Data Mining

J2SE

19

16

AS714 Data Mining

Typical Accept

20

17

AS714 Data Mining

C:\

21

18

AS714 Data Mining

Finish

22

23

19

AS714 Data Mining

Weka 3.6.1

24

Weka 3.6.1

25

20

AS714 Data Mining

3 WEKA WEKA Waikato Environment for Knowledge Analysis WEKA Software free download GNU General Public License Java (Machine Learning) Graphic User Interface / GUI Software 1. ASCII arff, csv, C45 2. URL 3. JDBC Arff 1. ARFF = Attribute-Relation File Format 2. ASCII 21

AS714 Data Mining

@relation name @attribute att-name type numeric real (v1, v2, , vn) @data Arff o text file notepad o @relation relation_name o @attribute att_name value o @data @data 1,2,3,4 sample01.csv

22

AS714 Data Mining

ID,SEX,PASS/FAIL,Score,Class 1,M,Pass,45.5,B 2,F,Pass,56.78,B 3,M,Pass,89,A 4,F,Pass,77,A 5,M,Fail,32,C 6,F,Fail,12,D sample01.csv 7,M,Fail,35,C

Weka o (Univariate Statistic) = Nominal Numeric 23

AS714 Data Mining

SEX o SEX o Nominal o o M F M 5 F 5 o SCORE o Score o Numeric o o 10

o () 24

AS714 Data Mining

Minimum = 10

Maximum = 89

Mean = 48.728

StdDev = 26.585

Explorer WEKA 3.6.1

26

WEKA ICON

Start Program Weka 3.6.1 Weka 3.6

25

AS714 Data Mining

27

WEKA 3.6.1 (Weka GUI Chooser) 2

28

26

AS714 Data Mining

Applications () 1. Explorer: GUI (Graphical User Interface) 2. Experimenter: 3. KnowledgeFlow: 4. Simple CLI: Menu bar ()1. Program

-LogWindow: log stdout stderr

27

AS714 Data Mining

-Memory usage:

29

-Exit: 2. Visualization

30

Weka

-Plot: 2

28

AS714 Data Mining

-ROC: ROC (receiver operating characteristic) curve

31

32

-TreeVisualizer: (directed graphs) decision tree 29

AS714 Data Mining

-GraphVisualizer: XML BIF DOT format Bayesian networks -BoundaryVisualizer:

33

3. Tools

- ArffViewer: MDI (Multiple Document Interface) ARFF spreadsheet

30

AS714 Data Mining

- SqlViewer: Sql query - Bayes net editor: , Bayes nets4. Help

34

WEKA

- Weka homepage: Brower WEKA (http://www.cs.waikato.ac.nz/~ml/weka/) - HOWTOs,code snippets, etc.: Weka Wiki WEKA (http://weka.wiki.sourceforge.net/)

31

AS714 Data Mining

- Weka on Sourceforge: WEKA Sourceforge.net (http://sourceforge.net/projects/weka/) - SystemInfo: Java/WEKA the CLASSPATH

35

32

AS714 Data Mining

4ExplorerUser InterfaceSection Tabs

36

1. Preprocess: 2. Classify: 3. Cluster: 4. Associate: 5. Select attributes: 6. Visualize: 33

AS714 Data Mining

Explorer

37

Status Box

Log Button

Weka

Bird icon

Weka

Graphical output

Loading Data

1. Preprocessing

1. Open file 34

AS714 Data Mining

Hard disk

38

2. Open URL Address

39

35

AS714 Data Mining

3. Open DB

40

4. Generate choose DataGenerator 36

AS714 Data Mining

41Working with filter

42

37

AS714 Data Mining (Filters) 2

Supervised (attribute) (instance)

Unsupervised (attribute) (instance) Attribute Remove ( 43)

43

38

AS714 Data Mining

Preprocess Open file weather.arff

5 - outlook, temperature, humidity, windy, play 14 outlook Nominal 3

sunny 5 overcast 4 rainy 5

39

AS714 Data Mining

weather.arff

40

AS714 Data Mining

41

AS714 Data Mining

Weka Visualize all

Weka Visualize Scatter plot PlotSize PointSize Update

42

AS714 Data Mining

Scatter Plot

Weka Select Attributes ( Ctrl ) Update

o Weka Weka Classify

Classifier Choose (Functions) Test Option Use Training Set Start LinearRegression

Test Option (Num) Classifier Output 43

AS714 Data Mining

2. Classification

44

44

AS714 Data Mining

Classifier

45

bayes: functions: lazy: meta: misc: trees: rules: 45

AS714 Data Mining3. Clustering

46

46

AS714 Data Mining

Petallength

o Logistic Regression Weka Classify Logistic Classifier Choose (Functions) Test Option Use Training Set Start

Test Option (Nom) Classifier Output

47

AS714 Data Mining

Play

48

AS714 Data Mining

4. Associate

Choose Associator

Start 48

47

49

AS714 Data Mining

48

5. Select Attribute

50

AS714 Data Mining

49

51

AS714 Data Mining

50

6. Visualize

52

AS714 Data Mining

51

53

AS714 Data Mining

Appendix

54

AS714 Data Mining

Weka sample01.cvs ID.SEX,PASS/FAIL,Score,Class 1,M,Pass,45.5,B 2,F,Pass,56.78,B 3,M,Pass,89,A 4,F,Pass,77,A 5,M,Fail,32,C 6,F,Fail,12,D 7,M,Fail,35,C 8,F,Pass,62,B 9,M,Pass,68,B+ 10,F,Fail,10,D

55

AS714 Data Mining

Weka o (Filters) o Supervised (attribute) (instance)

56

AS714 Data Mining

Unsupervised (attribute) (instance)

57

AS714 Data Mining

Remove

Supervised o : AttrivuteSelection, ClassOrder, Discretize, NominalToBinary : Resample, SpreadSubsample, StratifiedREmoveFolds

58

AS714 Data Mining

AttributeSelection

o evaluator search o OK o Apply

ClassOrder 59

AS714 Data Mining

o classOrder seed o OK Apply Discretize

o attributeIndices Help

60

AS714 Data Mining

o OK o Apply

Discretize Help

NominalToBinary

61

AS714 Data Mining

o Nominal Binary o OK Apply

Resample

62

AS714 Data Mining

o sampleSizePercent o OK Apply SpreadSubsample

63

AS714 Data Mining

o distributionSpread o OK Apply StratifiedRemoveFolds

64

AS714 Data Mining

o fold o OK o Apply Unsupervised o Help Weka o : Add, AddCluster, AddExpression, AddNoise, ClusterMembership, Copy, Discretize, FirstOrder, MakeIndicator, MergTwoValues, NominalToBinary, Normalize, NumericToBinary, NumericTransform, Obfuscate, PKIDiscretize, NumericToBinary, NumericTransform, Obfuscate, PKIDiscretize, RandomProjection, Remove, RemoveType, 65

AS714 Data Mining

RemoveUseless, ReplaceMissingValues, Standardize, StringToNominal, StringToWordVector, Swap Values, TimeSeriesData, TimeSeriesTranslate o : Normalize, NonSparseToSpare, Randomize, RemoveFolds, RemoveMisclassified, RemovePercentage, RemoveRange, RemoveWithValues, Resample, SparseToNonSparse o Add filter o AddExpression filter o NominalToBinary filter o NumericToBinary filter o NumericTransform filter o Remove filter o ReplaceMissing Values filter o Standardize filter o AddCluster filter o Discretize filter o Normalize filter o RemoveType filter 66

AS714 Data Mining

Add filter

o Add missing value o OK o Apply AddCluster filter

o addCluster SimpleKMeans o ignoredAttributeIndices 67

AS714 Data Mining

o OK o Apply AddExpression filter

o addExpression o OK o Apply Discretize filter

68

AS714 Data Mining

o attributeIndices o bins o equal width equal depth False useEqualFrequency o OK o Apply Discretize

MergeTwo Values filter

69

AS714 Data Mining

o MergeTwo Values o attributeIndex o firstValueIndex secondValueIndex o OK o Apply NominalToBinary filter

o 0 1 NominalToBinary o attributeIndices o OK 70

AS714 Data Mining

o Apply

Normalize filter

o Normalize

0-1

o Apply Numeric ToBinary filter

71

AS714 Data Mining

o 0 1 NumericToBinary 0 0 1 o Apply

0

Numeric Transform filter

o NumericTransform abs o OK o Apply 72

AS714 Data Mining

Remove filter

o Remove attributeIndices o OK o Apply RemoveType filter

o RemoveType attributeType o OK o Apply ReplaceMissing Value

73

AS714 Data Mining

o ReplaceMissingValue Standardize filter

o z-score

o OK o Apply

74

AS714 Data Mining

o Randomize o RemoveFolds o RemovePercentage o RemoveRange o RemoveWithValues o Resample Randomize filter

o Randomize o OK o Apply RemoveFold filter

75

AS714 Data Mining

o RemoveFold numFolds o Save o OK o Apply RemovePercentage filter

o RemovePercentage percentage

o OK o Apply 76

AS714 Data Mining

RemoveRange filter

o RemoveRange instancesindices o OK o Apply

RemoveWithValues filter

o RemoveWith Values attributeIndex 77

AS714 Data Mining

o splitPoint o OK o Apply Resample filter

o Resample sampleSizePercent o save o OK o Apply

o Weka (Filters)

78

AS714 Data Mining

Supervised Unsupervised o

Weka79

AS714 Data Mining

o Market Basket analysis o o Transaction o

o o TID

80

AS714 Data Mining

o Boolean y 1 T100,I1,I2 T100, 1, 1, ?, ?, ? Weka ? (missing value) market.arff

Market.arff

81

AS714 Data Mining

Apriori o Associate o Associator Apriori

Apriori 82

AS714 Data Mining

o min support lowerBoundMinSuport 0.2 ( 20%) o min confidence minMetric metricType Confidence 0.5 ( 50%) o numRules 100

Apriori

83

AS714 Data Mining

16 market.arff

1: I5 I1 2: I4 I2 84

AS714 Data Mining

o transaction Nominal Ordinal o Weka dummy coding Nominal Ordinal outlook overcast, sunny, rainy rainy weather.nominal.arff

outlook = overcast, outlook = sunny, outlook =

weather.nominal.arff

85

AS714 Data Mining

8 weather.nominal.arff

1: overcast play = yes 2: cool (narmal) 3: windy = FALSE play = yes

86

AS714 Data Mining

o Nominal Ordinal o transaction Nominal ? (missing value) TID, atri_1, attri_2,, attri_n TID attri_i y ? o Associate Apriori Associator o min support min confidence numRules

Wekao Classification 87

AS714 Data Mining

o o o ID3 o J48

o Classifier () o Bayes Functions Lazy Meta Misc Trees Rules 88

AS714 Data Mining

Weather.nominal.arff

4

14

o 89

AS714 Data Mining

o o k-fold cross validation leave-one-out o Validation, Test data Training data 3/10, 3/10 4/10

Weka explorer

90

AS714 Data Mining

o Weka Explorer o Weather.nominal.arff o Filter Classify

91

AS714 Data Mining

Outlook 3

temperature 3

92

AS714 Data Mining

humidity 2

windy 2

93

AS714 Data Mining

choose

classifiers classifiers trees

use training set

94

AS714 Data Mining

training Confusion matrix ( ) () diagonal

weather.arff @relation weather @attribute outlook {sunny, overcast, rainy} @attribute temperature real 95

AS714 Data Mining

@attribute humidity real @attribute windy {TRUE, FALSE} @attribute play {yes, no} @data sunny,85,85,FALSE,no sunny,80,90,TRUE,no

overcast,83,86,FALSE,yes rainy,70,96,FALSE,yes rainy,68,80,FALSE,yes rainy,65,70,TRUE,no overcast,64,65,TRUE,yes sunny,72,95,FALSE,no sunny,69,70,FALSE,yes rainy,75,80,FALSE,yes sunny,75,70,TRUE,yes overcast,72,90,TRUE,yes 96

AS714 Data Mining

overcast,81,75,FALSE,yes rainy,71,91,TRUE,no

o Discretize Filter filter

unsupervised

attribute 97

o bins 3

AS714 Data Mining

o OK o Apply

ID3

98

AS714 Data Mining

o Id3 Classify Classify classifiers o Use Training set Test option o Start o

trees

Id3

ID3

99

AS714 Data Mining

play 100% === Confusion Matrix === a b