NAME - BioQUEST · Web viewClass for generating a decision tree with naive Bayes classifiers at the leaves. For more information, see Ron Kohavi (1996). Scaling up the accuracy of

Models in WEKA

NAMEweka.classifiers.bayes.AODE

SYNOPSISAODE achieves highly accurate classification by averaging over all of a small space of alternative naive-Bayes-like models that have weaker (and hence less detrimental) independence assumptions than naive Bayes. The resulting algorithm is computationally efficient while delivering highly accurate classification on many learning tasks.

For more information, see

G. Webb, J. Boughton & Z. Wang (2004). Not So Naive Bayes. To be published in Machine Learning. G. Webb, J. Boughton & Z. Wang (2002). <i>Averaged One-Dependence Estimators: Preliminary Results. AI2002 Data Mining Workshop, Canberra.

OPTIONSdebug -- If set to true, classifier may output additional info to the console.

NAMEweka.classifiers.bayes.BayesNet

SYNOPSISBayes Network learning using various search algorithms and quality measures.

OPTIONSBIFFile -- Set the name of a file in BIF XML format. A Bayes network learned from data can be compared with the Bayes network represented by the BIF file. Statistics calculated are o.a. the number of missing and extra arcs.

debug -- If set to true, classifier may output additional info to the console.

estimator -- Select Estimator algorithm for finding the conditional probability tables of the Bayes Network.

searchAlgorithm -- Select method used for searching network structures.

useADTree -- When ADTree (the data structure for increasing speed on counts, not to be confused with the classifier under the same name) is used learning time goes down typically. However, because ADTrees are memory intensive, memory problems may occur. Switching this option off makes the structure learning algorithms slower, and run with less memory. By default, ADTrees are used.

NAMEweka.classifiers.bayes.ComplementNaiveBayes

SYNOPSISClass for building and using a Complement class Naive Bayes classifier. For more information see, ICML-2003 "Tackling the poor assumptions of Naive Bayes Text Classifiers"

1 of 35

P.S.: TF, IDF and length normalization transforms, as described in the paper, can be performed through weka.filters.unsupervised.StringToWordVector.


normalizeWordWeights -- Normalizes the word weights for each class.

smoothingParameter -- Sets the smoothing parameter to avoid zero WordGivenClass probabilities (default=1.0).

NAMEweka.classifiers.bayes.NaiveBayes

SYNOPSISClass for a Naive Bayes classifier using estimator classes. Numeric estimator precision values are chosen based on analysis of the training data. For this reason, the classifier is not an UpdateableClassifier (which in typical usage are initialized with zero training instances) -- if you need the UpdateableClassifier functionality, use the NaiveBayesUpdateable classifier. The NaiveBayesUpdateable classifier will use a default precision of 0.1 for numeric attributes when buildClassifier is called with zero training instances.

For more information on Naive Bayes classifiers, see

George H. John and Pat Langley (1995). Estimating Continuous Distributions in Bayesian Classifiers. Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence. pp. 338-345. Morgan Kaufmann, San Mateo.


useKernelEstimator -- Use a kernel estimator for numeric attributes rather than a normal distribution.

useSupervisedDiscretization -- Use supervised discretization to convert numeric attributes to nominal ones.

NAMEweka.classifiers.bayes.NaiveBayesMultinomial

SYNOPSISClass for building and using a multinomial Naive Bayes classifier. For more information see,

Andrew Mccallum, Kamal Nigam (1998) A Comparison of Event Models for Naive Bayes Text Classification


NAMEweka.classifiers.bayes.NaiveBayesSimple

SYNOPSISClass for building and using a simple Naive Bayes classifier.Numeric attributes are modelled by a normal distribution. For more information, see

2 of 35

Richard Duda and Peter Hart (1973). Pattern Classification and Scene Analysis. Wiley, New York.


NAMEweka.classifiers.bayes.NaiveBayesUpdateable

SYNOPSISClass for a Naive Bayes classifier using estimator classes. This is the updateable version of NaiveBayes.This classifier will use a default precision of 0.1 for numeric attributes when buildClassifier is called with zero training instances.

For more information on Naive Bayes classifiers, see

George H. John and Pat Langley (1995). Estimating Continuous Distributions in Bayesian Classifiers. Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence. pp. 338-345. Morgan Kaufmann, San Mateo.


useKernelEstimator -- Use a kernel estimator for numeric attributes rather than a normal distribution.

useSupervisedDiscretization -- Use supervised discretization to convert numeric attributes to nominal ones.

NAMEweka.classifiers.functions.LeastMedSq

SYNOPSISImplements a least median sqaured linear regression utilising the existing weka LinearRegression class to form predictions. Least squared regression functions are generated from random subsamples of the data. The least squared regression with the lowest meadian squared error is chosen as the final model.

The basis of the algorithm is

Robust regression and outlier detection Peter J. Rousseeuw, Annick M. Leroy. c1987


randomSeed -- Set the seed for selecting random subsamples of the training data.

sampleSize -- Set the size of the random samples used to generate the least sqaured regression functions.

NAMEweka.classifiers.functions.LinearRegression

3 of 35

SYNOPSISClass for using linear regression for prediction. Uses the Akaike criterion for model selection, and is able to deal with weighted instances.

OPTIONSattributeSelectionMethod -- Set the method used to select attributes for use in the linear regression. Available methods are: no attribute selection, attribute selection using M5's method (step through the attributes removing the one with the smallest standardised coefficient until no improvement is observed in the estimate of the error given by the Akaike information criterion), and a greedy selection using the Akaike information metric.

debug -- Outputs debug information to the console.

eliminateColinearAttributes -- Eliminate colinear attributes.

ridge -- The value of the Ridge parameter.

NAMEweka.classifiers.functions.Logistic

SYNOPSISClass for building and using a multinomial logistic regression model with a ridge estimator.

There are some modifications, however, compared to the paper of leCessie and van Houwelingen(1992):

If there are k classes for n instances with m attributes, the parameter matrix B to be calculated will be an m*(k-1) matrix.

The probability for class j with the exception of the last class is

Pj(Xi) = exp(XiBj)/((sum[j=1..(k-1)]exp(Xi*Bj))+1)

The last class has probability

1-(sum[j=1..(k-1)]Pj(Xi)) = 1/((sum[j=1..(k-1)]exp(Xi*Bj))+1)

The (negative) multinomial log-likelihood is thus:

L = -sum[i=1..n]{sum[j=1..(k-1)](Yij * ln(Pj(Xi)))+(1 - (sum[j=1..(k-1)]Yij)) * ln(1 - sum[j=1..(k-1)]Pj(Xi))} + ridge * (B^2)

In order to find the matrix B for which L is minimised, a Quasi-Newton Method is used to search for the optimized values of the m*(k-1) variables. Note that before we use the optimization procedure, we 'squeeze' the matrix B into a m*(k-1) vector. For details of the optimization procedure, please check weka.core.Optimization class.

4 of 35

Although original Logistic Regression does not deal with instance weights, we modify the algorithm a little bit to handle the instance weights.

For more information see:

le Cessie, S. and van Houwelingen, J.C. (1992). Ridge Estimators in Logistic Regression. Applied Statistics, Vol. 41, No. 1, pp. 191-201.

Note: Missing values are replaced using a ReplaceMissingValuesFilter, and nominal attributes are transformed into numeric attributes using a NominalToBinaryFilter.

OPTIONSdebug -- Output debug information to the console.

maxIts -- Maximum number of iterations to perform.

ridge -- Set the Ridge value in the log-likelihood.

NAMEweka.classifiers.functions.MultilayerPerceptron

SYNOPSISThis neural network uses backpropagation to train.

OPTIONSGUI -- Brings up a gui interface. This will allow the pausing and altering of the nueral network during training.

* To add a node left click (this node will be automatically selected, ensure no other nodes were selected).* To select a node left click on it either while no other node is selected or while holding down the control key (this toggles that node as being selected and not selected.* To connect a node, first have the start node(s) selected, then click either the end node or on an empty space (this will create a new node that is connected with the selected nodes). The selection status of nodes will stay the same after the connection. (Note these are directed connections, also a connection between two nodes will not be established more than once and certain connections that are deemed to be invalid will not be made).* To remove a connection select one of the connected node(s) in the connection and then right click the other node (it does not matter whether the node is the start or end the connection will be removed).* To remove a node right click it while no other nodes (including it) are selected. (This will also remove all connections to it).* To deselect a node either left click it while holding down control, or right click on empty space.* The raw inputs are provided from the labels on the left.* The red nodes are hidden layers.* The orange nodes are the output nodes.* The labels on the right show the class the output node represents. Note that with a numeric class the output node will automatically be made into an unthresholded linear unit.

Alterations to the neural network can only be done while the network is not running, This also applies to the learning rate and other fields on the control panel.

* You can accept the network as being finished at any time.* The network is automatically paused at the beginning.

5 of 35

* There is a running indication of what epoch the network is up to and what the (rough) error for that epoch was (or for the validation if that is being used). Note that this error value is based on a network that changes as the value is computed. (also depending on whether the class is normalized will effect the error reported for numeric classes.* Once the network is done it will pause again and either wait to be accepted or trained more.

Note that if the gui is not set the network will not require any interaction.

autoBuild -- Adds and connects up hidden layers in the network.


decay -- This will cause the learning rate to decrease. This will divide the starting learning rate by the epoch number, to determine what the current learning rate should be. This may help to stop the network from diverging from the target output, as well as improve general performance. Note that the decaying learning rate will not be shown in the gui, only the original learning rate. If the learning rate is changed in the gui, this is treated as the starting learning rate.

hiddenLayers -- This defines the hidden layers of the neural network. This is a list of positive whole numbers. 1 for each hidden layer. Comma seperated. To have no hidden layers put a single 0 here. This will only be used if autobuild is set. There are also wildcard values 'a' = (attribs + classes) / 2, 'i' = attribs, 'o' = classes , 't' = attribs + classes.

learningRate -- The amount the weights are updated.

momentum -- Momentum applied to the weights during updating.

nominalToBinaryFilter -- This will preprocess the instances with the filter. This could help improve performance if there are nominal attributes in the data.

normalizeAttributes -- This will normalize the attributes. This could help improve performance of the network. This is not reliant on the class being numeric. This will also normalize nominal attributes as well (after they have been run through the nominal to binary filter if that is in use) so that the nominal values are between -1 and 1

normalizeNumericClass -- This will normalize the class if it's numeric. This could help improve performance of the network, It normalizes the class to be between -1 and 1. Note that this is only internally, the output will be scaled back to the original range.

randomSeed -- Seed used to initialise the random number generator.Random numbers are used for setting the initial weights of the connections betweem nodes, and also for shuffling the training data.

reset -- This will allow the network to reset with a lower learning rate. If the network diverges from the answer this will automatically reset the network with a lower learning rate and begin training again. This option is only available if the gui is not set. Note that if the network diverges but isn't allowed to reset it will fail the training process and return an error message.

trainingTime -- The number of epochs to train through. If the validation set is non-zero then it can terminate the network early

6 of 35

validationSetSize -- The percentage size of the validation set.(The training will continue until it is observed that the error on the validation set has been consistently getting worse, or if the training time is reached).If This is set to zero no validation set will be used and instead the network will train for the specified number of epochs.

validationThreshold -- Used to terminate validation testing.The value here dictates how many times in a row the validation set error can get worse before training is terminated.

NAMEweka.classifiers.functions.PaceRegression

SYNOPSISClass for building pace regression linear models and using them for prediction.

Under regularity conditions, pace regression is provably optimal when the number of coefficients tends to infinity. It consists of a group of estimators that are either overall optimal or optimal under certain conditions.

The current work of the pace regression theory, and therefore also this implementation, do not handle:

- missing values - non-binary nominal attributes - the case that n - k is small where n is the number of instances and k is the number of coefficients (the threshold used in this implmentation is 20)

For more information see:

Wang, Y. (2000). A new approach to fitting linear models in high dimensional spaces. PhD Thesis. Department of Computer Science, University of Waikato, New Zealand.

Wang, Y. and Witten, I. H. (2002). Modeling for optimal probability prediction. Proceedings of ICML'2002. Sydney.

OPTIONSdebug -- Output debug information to the console.

estimator -- The estimator to use.

eb -- Empirical Bayes estimator for noraml mixture (default)nested -- Optimal nested model selector for normal mixturesubset -- Optimal subset selector for normal mixturepace2 -- PACE2 for Chi-square mixturepace4 -- PACE4 for Chi-square mixturepace6 -- PACE6 for Chi-square mixtureols -- Ordinary least squares estimatoraic -- AIC estimatorbic -- BIC estimatorric -- RIC estimatorolsc -- Ordinary least squares subset selector with a threshold

7 of 35

threshold -- Threshold for the olsc estimator.

NAMEweka.classifiers.functions.RBFNetwork

SYNOPSISClass that implements a normalized Gaussian radial basisbasis function network. It uses the k-means clustering algorithm to provide the basis functions and learns either a logistic regression (discrete class problems) or linear regression (numeric class problems) on top of that. Symmetric multivariate Gaussians are fit to the data from each cluster. If the class is nominal it uses the given number of clusters per class.It standardizes all numeric attributes to zero mean and unit variance.

OPTIONSclusteringSeed -- The random seed to pass on to K-means.


maxIts -- Maximum number of iterations for the logistic regression to perform. Only applied to discrete class problems.

minStdDev -- Sets the minimum standard deviation for the clusters.

numClusters -- The number of clusters for K-Means to generate.

ridge -- Set the Ridge value for the logistic or linear regression.

NAMEweka.classifiers.functions.SimpleLinearRegression

SYNOPSISLearns a simple linear regression model. Picks the attribute that results in the lowest squared error. Missing values are not allowed. Can only deal with numeric attributes.


NAMEweka.classifiers.functions.SimpleLogistic

SYNOPSISClassifier for building linear logistic regression models. LogitBoost with simple regression functions as base learners is used for fitting the logistic models. The optimal number of LogitBoost iterations to perform is cross-validated, which leads to automatic attribute selection. For more information see: N.Landwehr, M.Hall, E. Frank 'Logistic Model Trees' (ECML 2003).


8 of 35

errorOnProbabilities -- Use error on the probabilties as error measure when determining the best number of LogitBoost iterations. If set, the number of LogitBoost iterations is chosen that minimizes the root mean squared error (either on the training set or in the cross-validation, depending on useCrossValidation).

heuristicStop -- If heuristicStop > 0, the heuristic for greedy stopping while cross-validating the number of LogitBoost iterations is enabled. This means LogitBoost is stopped if no new error minimum has been reached in the last heuristicStop iterations. It is recommended to use this heuristic, it gives a large speed-up especially on small datasets. The default value is 50.

maxBoostingIterations -- Sets the maximum number of iterations for LogitBoost. Default value is 500, for very small/large datasets a lower/higher value might be preferable.

numBoostingIterations -- Set fixed number of iterations for LogitBoost. If >= 0, this sets the number of LogitBoost iterations to perform. If < 0, the number is cross-validated or a stopping criterion on the training set is used (depending on the value of useCrossValidation).

useCrossValidation -- Sets whether the number of LogitBoost iterations is to be cross-validated or the stopping criterion on the training set should be used. If not set (and no fixed number of iterations was given), the number of LogitBoost iterations is used that minimizes the error on the training set (misclassification error or error on probabilities depending on errorOnProbabilities).

NAMEweka.classifiers.functions.SMO

SYNOPSISImplements John Platt's sequential minimal optimization algorithm for training a support vector classifier.

This implementation globally replaces all missing values and transforms nominal attributes into binary ones. It also normalizes all attributes by default. (In that case the coefficients in the output are based on the normalized data, not the original data --- this is important for interpreting the classifier.)

Multi-class problems are solved using pairwise classification.

To obtain proper probability estimates, use the option that fits logistic regression models to the outputs of the support vector machine. In the multi-class case the predicted probabilities are coupled using Hastie and Tibshirani's pairwise coupling method.

Note: for improved speed normalization should be turned off when operating on SparseInstances.

For more information on the SMO algorithm, see

J. Platt (1998). "Fast Training of Support Vector Machines using Sequential Minimal Optimization". Advances in Kernel Methods - Support Vector Learning, B. Schoelkopf, C. Burges, and A. Smola, eds., MIT Press.

S.S. Keerthi, S.K. Shevade, C. Bhattacharyya, K.R.K. Murthy, "Improvements to Platt's SMO Algorithm for SVM Classifier Design". Neural Computation, 13(3), pp 637-649, 2001.

OPTIONSbuildLogisticModels -- Whether to fit logistic models to the outputs (for proper probability estimates).

9 of 35

c -- The complexity parameter C.

cacheSize -- The size of the kernel cache (should be a prime number). Use 0 for full cache.


epsilon -- The epsilon for round-off error (shouldn't be changed).

exponent -- The exponent for the polynomial kernel.

featureSpaceNormalization -- Whether feature-space normalization is performed (only available for non-linear polynomial kernels).

filterType -- Determines how/if the data will be transformed.

gamma -- The value of the gamma parameter for RBF kernels.

lowerOrderTerms -- Whether lower order polyomials are also used (only available for non-linear polynomial kernels).

numFolds -- The number of folds for cross-validation used to generate training data for logistic models (-1 means use training data).

randomSeed -- Random number seed for the cross-validation.

toleranceParameter -- The tolerance parameter (shouldn't be changed).

useRBF -- Whether to use an RBF kernel instead of a polynomial one.

NAMEweka.classifiers.functions.SMOreg

SYNOPSISImplements Alex Smola and Bernhard Scholkopf's sequential minimal optimization algorithm for training a support vector regression model. This implementation globally replaces all missing values and transforms nominal attributes into binary ones. It also normalizes all attributes by default. (Note that the coefficients in the output are based on the normalized/standardized data, not the original data.) For more information on the SMO algorithm, see

Alex J. Smola, Bernhard Scholkopf (1998). "A Tutorial on Support Vector Regression". NeuroCOLT2 Technical Report Series - NC2-TR-1998-030.

S.K. Shevade, S.S. Keerthi, C. Bhattacharyya, K.R.K. Murthy, "Improvements to SMO Algorithm for SVM Regression". Technical Report CD-99-16, Control Division Dept of Mechanical and Production Engineering, National University of Singapore.

OPTIONSc -- The complexity parameter C.

cacheSize -- The size of the kernel cache (should be a prime number).

10 of 35


eps -- The epsilon for round-off error (shouldn't be changed).

epsilon -- The amount up to which deviations are tolerated. Watch out, the value of epsilon is used with the (normalized/standardized) data.

exponent -- The exponent for the polynomial kernel.

featureSpaceNormalization -- Whether feature-space normalization is performed (only available for non-linear polynomial kernels).

filterType -- Determines how/if the data will be transformed.

gamma -- The value of the gamma parameter for RBF kernels.

lowerOrderTerms -- Whether lower order polyomials are also used (only available for non-linear polynomial kernels).

toleranceParameter -- The tolerance parameter (shouldn't be changed).

useRBF -- Whether to use an RBF kernel instead of a polynomial one.

NAMEweka.classifiers.functions.VotedPerceptron

SYNOPSISImplementation of the voted perceptron algorithm by Freund and Schapire. Globally replaces all missing values, and transforms nominal attributes into binary ones. For more information, see:

Y. Freund and R. E. Schapire (1998). Large margin classification using the perceptron algorithm. Proc. 11th Annu. Conf. on Comput. Learning Theory, pp. 209-217, ACM Press, New York, NY.


exponent -- Exponent for the polynomial kernel.

maxK -- The maximum number of alterations to the perceptron.

numIterations -- Number of iterations to be performed.

seed -- Seed for the random number generator.

NAMEweka.classifiers.functions.Winnow

SYNOPSISImplements Winnow and Balanced Winnow algorithms by Littlestone. For more information, see

11 of 35

N. Littlestone (1988). "Learning quickly when irrelevant attributes are abound: A new linear threshold algorithm". Machine Learning 2, pp. 285-318.

and

N. Littlestone (1989). "Mistake bounds and logarithmic linear-threshold learning algorithms". Technical report UCSC-CRL-89-11, University of California, Santa Cruz.

Does classification for problems with nominal attributes (which it converts into binary attributes).

OPTIONSalpha -- Promotion coefficient alpha.

balanced -- Whether to use the balanced version of the algorithm.

beta -- Demotion coefficient beta.


defaultWeight -- Initial value of weights/coefficients.

numIterations -- The number of iterations to be performed.

seed -- Random number seed used for data shuffling (-1 means no randomization).

threshold -- Prediction threshold (-1 means: set to number of attributes).

NAMEweka.classifiers.lazy.IB1

SYNOPSISNearest-neighbour classifier. Uses normalized Euclidean distance to find the training instance closest to the given test instance, and predicts the same class as this training instance. If multiple instances have the same (smallest) distance to the test instance, the first one found is used. For more information, see

Aha, D., and D. Kibler (1991) "Instance-based learning algorithms", Machine Learning, vol.6, pp. 37-66.


NAMEweka.classifiers.lazy.IBk

SYNOPSISK-nearest neighbours classifier. Normalizes attributes by default. Can select appropriate value of K based on cross-validation. Can also do distance weighting. For more information, see

Aha, D., and D. Kibler (1991) "Instance-based learning algorithms", Machine Learning, vol.6, pp. 37-66.

12 of 35

OPTIONSKNN -- The number of neighbours to use.

crossValidate -- Whether hold-one-out cross-validation will be used to select the best k value.


distanceWeighting -- Gets the distance weighting method used.

meanSquared -- Whether the mean squared error is used rather than mean absolute error when doing cross-validation for regression problems.

noNormalization -- Whether attribute normalization is turned off.

windowSize -- Gets the maximum number of instances allowed in the training pool. The addition of new instances above this value will result in old instances being removed. A value of 0 signifies no limit to the number of training instances.

NAMEweka.classifiers.lazy.KStar

SYNOPSISK* is an instance-based classifier, that is the class of a test instance is based upon the class of those training instances similar to it, as determined by some similarity function. It differs from other instance-based learners in that it uses an entropy-based distance function. For more information on K*, see

John, G. Cleary and Leonard, E. Trigg (1995) "K*: An Instance- based Learner Using an Entropic Distance Measure", Proceedings of the 12th International Conference on Machine learning, pp. 108-114.


entropicAutoBlend -- Whether entropy-based blending is to be used.

globalBlend -- The parameter for global blending. Values are restricted to [0,100].

missingMode -- Determines how missing attribute values are treated.

NAMEweka.classifiers.lazy.LBR

SYNOPSISLazy Bayesian Rules Classifier. The naive Bayesian classifier provides a simple and effective approach to classifier learning, but its attribute independence assumption is often violated in the real world. Lazy Bayesian Rules selectively relaxes the independence assumption, achieving lower error rates over a range of learning tasks. LBR defers processing to classification time, making it a highly efficient and accurate classification algorithm when small numbers of objects are to be classified.


13 of 35

NAMEweka.classifiers.lazy.LWL

SYNOPSISClass for performing locally weighted learning. Can do classification (e.g. using naive Bayes) or regression (e.g. using linear regression). The base learner needs to implement WeightedInstancesHandler. For more info, see

Eibe Frank, Mark Hall, and Bernhard Pfahringer (2003). "Locally Weighted Naive Bayes". Conference on Uncertainty in AI.

Atkeson, C., A. Moore, and S. Schaal (1996) "Locally weighted learning" AI Reviews.

OPTIONSKNN -- How many neighbours are used to determine the width of the weighting function (<= 0 means all neighbours).

classifier -- The base classifier to be used.


dontNormalize -- Turns off normalization for attribute values in distance calculation.

weightingKernel -- Determines weighting function. [0 = Linear, 1 = Epnechnikov,2 = Tricube, 3 = Inverse, 4 = Gaussian and 5 = Constant. (default 0 = Linear)].

NAMEweka.classifiers.meta.AdaBoostM1

SYNOPSISClass for boosting a nominal class classifier using the Adaboost M1 method. Only nominal class problems can be tackled. Often dramatically improves performance, but sometimes overfits. For more information, see

Yoav Freund and Robert E. Schapire (1996). "Experiments with a new boosting algorithm". Proc International Conference on Machine Learning, pages 148-156, Morgan Kaufmann, San Francisco.

OPTIONSclassifier -- The base classifier to be used.



seed -- The random number seed to be used.

useResampling -- Whether resampling is used instead of reweighting.

weightThreshold -- Weight threshold for weight pruning.

NAME

14 of 35

weka.classifiers.meta.AdditiveRegression

SYNOPSIS Meta classifier that enhances the performance of a regression base classifier. Each iteration fits a model to the residuals left by the classifier on the previous iteration. Prediction is accomplished by adding the predictions of each classifier. Reducing the shrinkage (learning rate) parameter helps prevent overfitting and has a smoothing effect but increases the learning time. For more information see: Friedman, J.H. (1999). Stochastic Gradient Boosting. Technical Report Stanford University. http://www-stat.stanford.edu/~jhf/ftp/stobst.ps.




shrinkage -- Shrinkage rate. Smaller values help prevent overfitting and have a smoothing effect (but increase learning time). Default = 1.0, ie. no shrinkage.

NAMEweka.classifiers.meta.AttributeSelectedClassifier

SYNOPSISDimensionality of training and test data is reduced by attribute selection before being passed on to a classifier.



evaluator -- Set the attribute evaluator to use. This evaluator is used during the attribute selection phase before the classifier is invoked.

search -- Set the search method. This search method is used during the attribute selection phase before the classifier is invoked.

NAMEweka.classifiers.meta.Bagging

SYNOPSISClass for bagging a classifier to reduce variance. Can do classification and regression depending on the base learner. For more information, see

Leo Breiman (1996). "Bagging predictors". Machine Learning, 24(2):123-140.

OPTIONSbagSizePercent -- Size of each bag, as a percentage of the training set size.

calcOutOfBag -- Whether the out-of-bag error is calculated.

15 of 35





NAMEweka.classifiers.meta.ClassificationViaRegression

SYNOPSISClass for doing classification using regression methods. Class is binarized and one regression model is built for each class value. For more information, see, for example

E. Frank, Y. Wang, S. Inglis, G. Holmes, and I.H. Witten (1998) "Using model trees for classification", Machine Learning, Vol.32, No.1, pp. 63-76.



NAMEweka.classifiers.meta.CostSensitiveClassifier

SYNOPSISA metaclassifier that makes its base classifier cost-sensitive. Two methods can be used to introduce cost-sensitivity: reweighting training instances according to the total cost assigned to each class; or predicting the class with minimum expected misclassification cost (rather than the most likely class). Performance can often be improved by using a Bagged classifier to improve the probability estimates of the base classifier.


costMatrix -- Sets the cost matrix explicitly. This matrix is used if the costMatrixSource property is set to "Supplied".

costMatrixSource -- Sets where to get the cost matrix. The two options areto use the supplied explicit cost matrix (the setting of the costMatrix property), or to load a cost matrix from a file when required (this file will be loaded from the directory set by the onDemandDirectory property and will be named relation_name.cost).


minimizeExpectedCost -- Sets whether the minimum expected cost criteria will be used. If this is false, the training data will be reweighted according to the costs assigned to each class. If true, the minimum expected cost criteria will be used.

onDemandDirectory -- Sets the directory where cost files are loaded from. This option is used when the costMatrixSource is set to "On Demand".

16 of 35


NAMEweka.classifiers.meta.CVParameterSelection

SYNOPSISClass for performing parameter selection by cross-validation for any classifier. For more information, see:R. Kohavi (1995). Wrappers for Performance Enhancement and Oblivious Decision Graphs. PhD Thesis. Department of Computer Science, Stanford University.

OPTIONSCVParameters -- Sets the scheme parameters which are to be set by cross-validation.The format for each string should be:param_char lower_bound upper_bound number_of_stepseg to search a parameter -P from 1 to 10 by increments of 1: "P 1 10 11"



numFolds -- Get the number of folds used for cross-validation.


NAMEweka.classifiers.meta.Decorate

SYNOPSISDECORATE is a meta-learner for building diverse ensembles of classifiers by using specially constructed artificial training examples. Comprehensive experiments have demonstrated that this technique is consistently more accurate than the base classifier, Bagging and Random Forests.Decorate also obtains higher accuracy than Boosting on small training sets, and achieves comparable performance on larger training sets. For more details see: P. Melville & R. J. Mooney. Constructing diverse classifier ensembles using artificial training examples (IJCAI 2003).P. Melville & R. J. Mooney. Creating diversity in ensembles using artificial data (submitted).

OPTIONSartificialSize -- determines the number of artificial examples to use during training. Specified as a proportion of the training data. Higher values can increase ensemble diversity.



desiredSize -- the desired number of member classifiers in the Decorate ensemble. Decorate may terminate before this size is reached (depending on the value of numIterations). Larger ensemble sizes usually lead to more accurate models, but increases training time and model complexity.

17 of 35

numIterations -- the maximum number of Decorate iterations to run. Each iteration generates a classifier, but does not necessarily add it to the ensemble. Decorate stops when the desired ensemble size is reached. This parameter should be greater than equal to the desiredSize. If the desiredSize is not being reached it may help to increase this value.


NAMEweka.classifiers.meta.FilteredClassifier

SYNOPSISClass for running an arbitrary classifier on data that has been passed through an arbitrary filter. Like the classifier, the structure of the filter is based exclusively on the training data and test instances will be processed by the filter without changing their structure.



filter -- The filter to be used.

NAMEweka.classifiers.meta.Grading

SYNOPSISImplements Grading. The base classifiers are "graded". For more information, see

Seewald A.K., Fuernkranz J. (2001): An Evaluation of Grading Classifiers, in Hoffmann F. et al. (eds.), Advances in Intelligent Data Analysis, 4th International Conference, IDA 2001, Proceedings, Springer, Berlin/Heidelberg/New York/Tokyo, pp.115-124, 2001

OPTIONSclassifiers -- The base classifiers to be used.


metaClassifier -- The meta classifiers to be used.

numFolds -- The number of folds used for cross-validation.


NAMEweka.classifiers.meta.LogitBoost

SYNOPSISClass for performing additive logistic regression. This class performs classification using a regression scheme as the base learner, and can handle multi-class problems. For more information, see

18 of 35

Friedman, J., T. Hastie and R. Tibshirani (1998) "Additive Logistic Regression: a Statistical View of Boosting". Technical report. Stanford University.

Can do efficient internal cross-validation to determine appropriate number of iterations.



likelihoodThreshold -- Threshold on improvement in likelihood.

numFolds -- Number of folds for internal cross-validation (default 0 means no cross-validation is performed).


numRuns -- Number of runs for internal cross-validation.


shrinkage -- Shrinkage parameter (use small value like 0.1 to reduce overfitting).


weightThreshold -- Weight threshold for weight pruning (reduce to 90 for speeding up learning process).

NAMEweka.classifiers.meta.MetaCost

SYNOPSISThis metaclassifier makes its base classifier cost-sensitive using the method specified in

Pedro Domingos (1999) "MetaCost: A general method for making classifiers cost-sensitive", Proceedings of the Fifth International Conference on Knowledge Discovery and Data Mining, pp 155-164.

This classifier should produce similar results to one created by passing the base learner to Bagging, which is in turn passed to a CostSensitiveClassifier operating on minimum expected cost. The difference is that MetaCost produces a single cost-sensitive classifier of the base learner, giving the benefits of fast classification and interpretable output (if the base learner itself is interpretable). This implementation uses all bagging iterations when reclassifying training data (the MetaCost paper reports a marginal improvement when only those iterations containing each training instance are used in reclassifying that instance).

OPTIONSbagSizePercent -- The size of each bag, as a percentage of the training set size.


costMatrix -- A misclassification cost matrix.

19 of 35

costMatrixSource -- Gets the source location method of the cost matrix. Will be one of MATRIX_ON_DEMAND or MATRIX_SUPPLIED.


numIterations -- The number of bagging iterations.

onDemandDirectory -- Name of directory to search for cost files when loading costs on demand.


NAMEweka.classifiers.meta.MultiBoostAB

SYNOPSISClass for boosting a classifier using the MultiBoosting method.

MultiBoosting is an extension to the highly successful AdaBoost technique for forming decision committees. MultiBoosting can be viewed as combining AdaBoost with wagging. It is able to harness both AdaBoost's high bias and variance reduction with wagging's superior variance reduction. Using C4.5 as the base learning algorithm, Multi-boosting is demonstrated to produce decision committees with lower error than either AdaBoost or wagging significantly more often than the reverse over a large representative cross-section of UCI data sets. It offers the further advantage over AdaBoost of suiting parallel execution. For more information, see

Geoffrey I. Webb (2000). "MultiBoosting: A Technique for Combining Boosting and Wagging". Machine Learning, 40(2): 159-196, Kluwer Academic Publishers, Boston




numSubCmtys -- Sets the (approximate) number of subcommittees.



weightThreshold -- Weight threshold for weight pruning.

NAMEweka.classifiers.meta.MultiClassClassifier

SYNOPSISA metaclassifier for handling multi-class datasets with 2-class classifiers. This classifier is also capable of applying error correcting output codes for increased accuracy.

OPTIONS

20 of 35



method -- Sets the method to use for transforming the multi-class problem into several 2-class ones.

randomWidthFactor -- Sets the width multiplier when using random codes. The number of codes generated will be thus number multiplied by the number of classes.


NAMEweka.classifiers.meta.MultiScheme

SYNOPSISClass for selecting a classifier from among several using cross validation on the training data or the performance on the training data. Performance is measured based on percent correct (classification) or mean-squared error (regression).

OPTIONSclassifiers -- The classifiers to be chosen from.

debug -- Whether debug information is output to console.

numFolds -- The number of folds used for cross-validation (if 0, performance on training data will be used).

seed -- The seed used for randomizing the data for cross-validation.

NAMEweka.classifiers.meta.OrdinalClassClassifier

SYNOPSIS Meta classifier that allows standard classification algorithms to be applied to ordinal class problems. For more information see: Frank, E. and Hall, M. (in press). A simple approach to ordinal prediction. 12th European Conference on Machine Learning. Freiburg, Germany.



NAMEweka.classifiers.meta.RacedIncrementalLogitBoost

SYNOPSISClassifier for incremental learning of large datasets by way of racing logit-boosted committees.


21 of 35


maxChunkSize -- The maximum number of instances to train the base learner with. The chunk sizes used will start at minChunkSize and grow twice as large for as many times as they are less than or equal to the maximum size.

minChunkSize -- The minimum number of instances to train the base learner with.

pruningType -- The pruning method to use within each committee. Log likelihood pruning will discard new models if they have a negative effect on the log likelihood of the validation data.


useResampling -- Force the use of resampling data rather than using the weight-handling capabilities of the base classifier. Resampling is always used if the base classifier cannot handle weighted instances.

validationChunkSize -- The number of instances to hold out for validation. These instances will be taken from the beginning of the stream, so learning will not start until these instances have been consumed first.

NAMEweka.classifiers.meta.RandomCommittee

SYNOPSISClass for building an ensemble of randomizable base classifiers. Each base classifiers is built using a different random number seed (but based one the same data). The final prediction is a straight average of the predictions generated by the individual base classifiers.





NAMEweka.classifiers.meta.RegressionByDiscretization

SYNOPSISA regression scheme that employs any classifier on a copy of the data that has the class attribute (equal-width) discretized. The predicted value is the expected value of the mean class value for each discretized interval (based on the predicted probabilities for each interval).



numBins -- Number of bins for discretization.

22 of 35

NAMEweka.classifiers.meta.Stacking

SYNOPSISCombines several classifiers using the stacking method. Can do classification or regression. For more information, see

David H. Wolpert (1992). "Stacked generalization". Neural Networks, 5:241-259, Pergamon Press.






NAMEweka.classifiers.meta.StackingC

SYNOPSISImplements StackingC (more efficient version of stacking). For more information, see

Seewald A.K.: "How to Make Stacking Better and Faster While Also Taking Care of an Unknown Weakness", in Sammut C., Hoffmann A. (eds.), Proceedings of the Nineteenth International Conference on Machine Learning (ICML 2002), Morgan Kaufmann Publishers, pp.554-561, 2002.

Note: requires meta classifier to be a numeric prediction scheme.






NAMEweka.classifiers.meta.ThresholdSelector

SYNOPSISA metaclassifier that selecting a mid-point threshold on the probability output by a Classifier. The midpoint threshold is set so that a given performance measure is optimized. Currently this is the F-measure. Performance

23 of 35

is measured either on the training data, a hold-out set or using cross-validation. In addition, the probabilities returned by the base learner can have their range expanded so that the output probabilities will reside between 0 and 1 (this is useful if the scheme normally produces probabilities in a very narrow range).



designatedClass -- Sets the class value for which the optimization is performed. The options are: pick the first class value; pick the second class value; pick whichever class is least frequent; pick whichever class value is most frequent; pick the first class named any of "yes","pos(itive)", "1", or the least frequent if no matches).

evaluationMode -- Sets the method used to determine the threshold/performance curve. The options are: perform optimization based on the entire training set (may result in overfitting); perform an n-fold cross-validation (may be time consuming); perform one fold of an n-fold cross-validation (faster but likely less accurate).

numXValFolds -- Sets the number of folds used during full cross-validation and tuned fold evaluation. This number will be automatically reduced if there are insufficient positive examples.

rangeCorrection -- Sets the type of prediction range correction performed. The options are: do not do any range correction; expand predicted probabilities so that the minimum probability observed during the optimization maps to 0, and the maximum maps to 1 (values outside this range are clipped to 0 and 1).


NAMEweka.classifiers.meta.Vote

SYNOPSISClass for combining classifiers using unweighted average of probability estimates (classification) or numeric predictions (regression).



NAMEweka.classifiers.misc.HyperPipes

SYNOPSISClass implementing a HyperPipe classifier. For each category a HyperPipe is constructed that contains all points of that category (essentially records the attribute bounds observed for each category). Test instances are classified according to the category that "most contains the instance". Does not handle numeric class, or missing values in test cases. Extremely simple algorithm, but has the advantage of being extremely fast, and works quite well when you have "smegloads" of attributes.

OPTIONS

24 of 35


NAMEweka.classifiers.misc.VFI

SYNOPSISClassification by voting feature intervals. Intervals are constucted around each class for each attribute (basically discretization). Class counts are recorded for each interval on each attribute. Classification is by voting. For more info see Demiroz, G. and Guvenir, A. (1997) "Classification by voting feature intervals", ECML-97.

Have added a simple attribute weighting scheme. Higher weight is assigned to more confident intervals, where confidence is a function of entropy:weight (att_i) = (entropy of class distrib att_i / max uncertainty)^-bias

OPTIONSbias -- Strength of bias towards more confident features


weightByConfidence -- Weight feature intervals by confidence

NAMEweka.classifiers.trees.ADTree

SYNOPSISClass for generating an alternating decision tree. The basic algorithm is based on:

Freund, Y., Mason, L.: "The alternating decision tree learning algorithm". Proceeding of the Sixteenth International Conference on Machine Learning, Bled, Slovenia, (1999) 124-133.

This version currently only supports two-class problems. The number of boosting iterations needs to be manually tuned to suit the dataset and the desired complexity/accuracy tradeoff. Induction of the trees has been optimized, and heuristic search methods have been introduced to speed learning.


numOfBoostingIterations -- Sets the number of boosting iterations to perform. You will need to manually tune this parameter to suit the dataset and the desired complexity/accuracy tradeoff. More boosting iterations will result in larger (potentially more accurate) trees, but will make learning slower. Each iteration will add 3 nodes (1 split + 2 prediction) to the tree unless merging occurs.

randomSeed -- Sets the random seed to use for a random search.

saveInstanceData -- Sets whether the tree is to save instance data - the model will take up more memory if it does. If enabled you will be able to visualize the instances at the prediction nodes when visualizing the tree.

searchPath -- Sets the type of search to perform when building the tree. The default option (Expand all paths) will do an exhaustive search. The other search methods are heuristic, so they are not guaranteed to find an optimal solution but they are much faster. Expand the heaviest path: searches the path with the most heavily

25 of 35

weighted instances. Expand the best z-pure path: searches the path determined by the best z-pure estimate. Expand a random path: the fastest method, simply searches down a single random path on each iteration.

NAMEweka.classifiers.trees.DecisionStump

SYNOPSISClass for building and using a decision stump. Usually used in conjunction with a boosting algorithm. Does regression (based on mean-squared error) or classification (based on entropy). Missing is treated as a separate value.


NAMEweka.classifiers.trees.Id3

SYNOPSISClass for constructing an unpruned decision tree based on the ID3 algorithm. Can only deal with nominal attributes. No missing values allowed. Empty leaves may result in unclassified instances. For more information see:

R. Quinlan (1986). "Induction of decision trees". Machine Learning. Vol.1, No.1, pp. 81-106


NAMEweka.classifiers.trees.J48

SYNOPSISClass for generating a pruned or unpruned C4.5 decision tree. For more information, see

Ross Quinlan (1993). "C4.5: Programs for Machine Learning", Morgan Kaufmann Publishers, San Mateo, CA.

OPTIONSbinarySplits -- Whether to use binary splits on nominal attributes when building the trees.

confidenceFactor -- The confidence factor used for pruning (smaller values incur more pruning).


minNumObj -- The minimum number of instances per leaf.

numFolds -- Determines the amount of data used for reduced-error pruning. One fold is used for pruning, the rest for growing the tree.

reducedErrorPruning -- Whether reduced-error pruning is used instead of C.4.5 pruning.

26 of 35

saveInstanceData -- Whether to save the training data for visualization.

seed -- The seed used for randomizing the data when reduced-error pruning is used.

subtreeRaising -- Whether to consider the subtree raising operation when pruning.

unpruned -- Whether pruning is performed.

useLaplace -- Whether counts at leaves are smoothed based on Laplace.

NAMEweka.classifiers.trees.LMT

SYNOPSISClassifier for building 'logistic model trees', which are classification trees with logistic regression functions at the leaves. The algorithm can deal with binary and multi-class target variables, numeric and nominal attributes and missing values. For more information see: N.Landwehr, M.Hall, E. Frank 'Logistic Model Trees' (ECML 2003).

OPTIONSconvertNominal -- Convert all nominal attributes to binary ones before building the tree. This means that all splits in the final tree will be binary.


errorOnProbabilities -- Minimize error on probabilities instead of misclassification error when cross-validating the number of LogitBoost iterations. When set, the number of LogitBoost iterations is chosen that minimizes the root mean squared error instead of the misclassification error.

fastRegression -- Use heuristic that avoids cross-validating the number of Logit-Boost iterations at every node. When fitting the logistic regression functions at a node, LMT has to determine the number of LogitBoost iterations to run. Originally, this number was cross-validated at every node in the tree. To save time, this heuristic cross-validates the number only once and then uses that number at every node in the tree. Usually this does not decrease accuracy but improves runtime considerably.

minNumInstances -- Set the minimum number of instances at which a node is considered for splitting. The default value is 15.

numBoostingIterations -- Set a fixed number of iterations for LogitBoost. If >= 0, this sets a fixed number of LogitBoost iterations that is used everywhere in the tree. If < 0, the number is cross-validated.

splitOnResiduals -- Set splitting criterion based on the residuals of LogitBoost. There are two possible splitting criteria for LMT: the default is to use the C4.5 splitting criterion that uses information gain on the class variable. The other splitting criterion tries to improve the purity in the residuals produces when fitting the logistic regression functions. The choice of the splitting criterion does not usually affect classification accuracy much, but can produce different trees.

NAMEweka.classifiers.trees.M5P

27 of 35

SYNOPSISThe original algorithm M5 was invented by Quinlan:Quinlan J. R. (1992). Learning with continuous classes. Proceedings of the Australian Joint Conference on Artificial Intelligence. 343--348. World Scientific, Singapore.

Yong Wang made improvements and created M5':Wang, Y and Witten, I. H. (1997). Induction of model trees for predicting continuous classes. Proceedings of the poster papers of the European Conference on Machine Learning. University of Economics, Faculty of Informatics and Statistics, Prague.


NAMEweka.classifiers.trees.NBTree

SYNOPSISClass for generating a decision tree with naive Bayes classifiers at the leaves. For more information, see

Ron Kohavi (1996). Scaling up the accuracy of naive-Bayes classifiers: a decision tree hybrid. Procedings of the Second Internaltional Conference on Knoledge Discovery and Data Mining.


NAMEweka.classifiers.trees.RandomForest

SYNOPSISClass for constructing a forest of random trees. For more information see:

Leo Breiman. "Random Forests". Machine Learning 45 (1):5-32, October 2001.


numFeatures -- The number of attributes to be used in random selection (see RandomTree).

numTrees -- The number of trees to be generated.


NAMEweka.classifiers.trees.RandomTree

SYNOPSISClass for constructing a tree that considers K randomly chosen attributes at each node. Performs no pruning.

OPTIONS

28 of 35

KValue -- Sets the number of randomly chosen attributes.

debug -- Whether debug information is output to the console.

minNum -- The minimum total weight of the instances in a leaf.

seed -- The random number seed used for selecting attributes.

NAMEweka.classifiers.trees.REPTree

SYNOPSISFast decision tree learner. Builds a decision/regression tree using information gain/variance and prunes it using reduced-error pruning (with backfitting). Only sorts values for numeric attributes once. Missing values are dealt with by splitting the corresponding instances into pieces (i.e. as in C4.5).


maxDepth -- The maximum tree depth (-1 for no restriction).

minNum -- The minimum total weight of the instances in a leaf.

minVarianceProp -- The minimum proportion of the variance on all the data that needs to be present at a node in order for splitting to be performed in regression trees.

noPruning -- Whether pruning is performed.

numFolds -- Determines the amount of data used for pruning. One fold is used for pruning, the rest for growing the rules.

seed -- The seed used for randomizing the data.

NAMEweka.classifiers.trees.UserClassifier

SYNOPSISInteractively classify through visual means. You are Presented with a scatter graph of the data against two user selectable attributes, as well as a view of the decision tree. You can create binary splits by creating polygons around data plotted on the scatter graph, as well as by allowing another classifier to take over at points in the decision tree should you see fit.


NAMEweka.classifiers.rules.ConjunctiveRule

SYNOPSISThis class implements a single conjunctive rule learner that can predict for numeric and nominal class labels.

29 of 35

A rule consists of antecedents "AND"ed together and the consequent (class value) for the classification/regression. In this case, the consequent is the distribution of the available classes (or mean for a numeric value) in the dataset. If the test instance is not covered by this rule, then it's predicted using the default class distributions/value of the data not covered by the rule in the training data.This learner selects an antecedent by computing the Information Gain of each antecendent and prunes the generated rule using Reduced Error Prunning (REP) or simple pre-pruning based on the number of antecedents.

For classification, the Information of one antecedent is the weighted average of the entropies of both the data covered and not covered by the rule.For regression, the Information is the weighted average of the mean-squared errors of both the data covered and not covered by the rule.

In pruning, weighted average of the accuracy rates on the pruning data is used for classification while the weighted average of the mean-squared errors on the pruning data is used for regression.


exclusive -- Set whether to consider exclusive expressions for nominal attribute splits.

folds -- Determines the amount of data used for pruning. One fold is used for pruning, the rest for growing the rules.

minNo -- The minimum total weight of the instances in a rule.

numAntds -- Set the number of antecedents allowed in the rule if pre-pruning is used. If this value is other than -1, then pre-pruning will be used, otherwise the rule uses reduced-error pruning.


NAMEweka.classifiers.rules.DecisionTable

SYNOPSISClass for building and using a simple decision table majority classifier. For more information see:

Kohavi R. (1995). "The Power of Decision Tables." In Proc European Conference on Machine Learning.

OPTIONScrossVal -- Sets the number of folds for cross validation (1 = leave one out).


displayRules -- Sets whether rules are to be printed.

maxStale -- Sets the number of non improving decision tables to consider before abandoning the search.

30 of 35

useIBk -- Sets whether IBk should be used instead of the majority class.

NAMEweka.classifiers.rules.JRip

SYNOPSISThis class implements a propositional rule learner, Repeated Incremental Pruning to Produce Error Reduction (RIPPER), which was proposed by William W. Cohen as an optimized version of IREP.

The algorithm is briefly described as follows:

Initialize RS = {}, and for each class from the less prevalent one to the more frequent one, DO:

1. Building stage:Repeat 1.1 and 1.2 until the descrition length (DL) of the ruleset and examples is 64 bits greater than the smallest DL met so far, or there are no positive examples, or the error rate >= 50%.

1.1. Grow phase:Grow one rule by greedily adding antecedents (or conditions) to the rule until the rule is perfect (i.e. 100% accurate). The procedure tries every possible value of each attribute and selects the condition with highest information gain: p(log(p/t)-log(P/T)).

1.2. Prune phase:Incrementally prune each rule and allow the pruning of any final sequences of the antecedents;The pruning metric is (p-n)/(p+n) -- but it's actually 2p/(p+n) -1, so in this implementation we simply use p/(p+n) (actually (p+1)/(p+n+2), thus if p+n is 0, it's 0.5).

2. Optimization stage: after generating the initial ruleset {Ri}, generate and prune two variants of each rule Ri from randomized data using procedure 1.1 and 1.2. But one variant is generated from an empty rule while the other is generated by greedily adding antecedents to the original rule. Moreover, the pruning metric used here is (TP+TN)/(P+N).Then the smallest possible DL for each variant and the original rule is computed. The variant with the minimal DL is selected as the final representative of Ri in the ruleset.After all the rules in {Ri} have been examined and if there are still residual positives, more rules are generated based on the residual positives using Building Stage again. 3. Delete the rules from the ruleset that would increase the DL of the whole ruleset if it were in it. and add resultant ruleset to RS. ENDDO

Note that there seem to be 2 bugs in the original ripper program that would affect the ruleset size and accuracy slightly. This implementation avoids these bugs and thus is a little bit different from Cohen's original implementation. Even after fixing the bugs, since the order of classes with the same frequency is not defined in ripper, there still seems to be some trivial difference between this implementation and the original ripper, especially for audiology data in UCI repository, where there are lots of classes of few instances.

Details please see "Fast Effective Rule Induction", William W. Cohen, 'Machine Learning: Proceedings of the Twelfth International Conference'(ML95).

PS. We have compared this implementation with the original ripper implementation in aspects of accuracy, ruleset size and running time on both artificial data "ab+bcd+defg" and UCI datasets. In all these aspects it

31 of 35

seems to be quite comparable to the original ripper implementation. However, we didn't consider memory consumption optimization in this implementation.

OPTIONScheckErrorRate -- Whether check for error rate >= 1/2 is included in stopping criterion.

debug -- Whether debug information is output to the console.



optimizations -- The number of optimization runs.


usePruning -- Whether pruning is performed.

NAMEweka.classifiers.rules.M5Rules

SYNOPSISGenerates a decision list for regression problems using separate-and-conquer. In each iteration it builds a model tree using M5 and makes the "best" leaf into a rule. Reference:

M. Hall, G. Holmes, E. Frank (1999). "Generating Rule Sets from Model Trees". Proceedings of the Twelfth Australian Joint Conference on Artificial Intelligence, Sydney, Australia. Springer-Verlag, pp. 1-12.


NAMEweka.classifiers.rules.NNge

SYNOPSISNearest-neighbor-like algorithm using non-nested generalized exemplars (which are hyperrectangles that can be viewed as if-then rules). For more information, see

Brent Martin, (1995) "Instance-Based learning : Nearest Neighbor With Generalization", Master Thesis, University of Waikato, Hamilton, New Zealand

Sylvain Roy (2002) "Nearest Neighbor With Generalization",Unpublished, University of Canterbury, Christchurch, New Zealand

OPTIONS

32 of 35


numAttemptsOfGeneOption -- Sets the number of attempts for generalization.

numFoldersMIOption -- Sets the number of folder for mutual information.

NAMEweka.classifiers.rules.OneR

SYNOPSISClass for building and using a 1R classifier; in other words, uses the minimum-error attribute for prediction, discretizing numeric attributes. For more information, see

:R.C. Holte (1993). "Very simple classification rules perform well on most commonly used datasets". Machine Learning, Vol. 11, pp. 63-91.


minBucketSize -- The minimum bucket size used for discretizing numeric attributes.

NAMEweka.classifiers.rules.PART

SYNOPSISClass for generating a PART decision list. Uses separate-and-conquer. Builds a partial C4.5 decision tree in each iteration and makes the "best" leaf into a rule. For more information, see:

Eibe Frank and Ian H. Witten (1998). "Generating Accurate Rule Sets Without Global Optimization."In Shavlik, J., ed., Machine Learning: Proceedings of the Fifteenth International Conference, Morgan Kaufmann Publishers.

OPTIONSbinarySplits -- Whether to use binary splits on nominal attributes when building the partial trees.

confidenceFactor -- The confidence factor used for pruning (smaller values incur more pruning).


minNumObj -- The minimum number of instances per rule.

numFolds -- Determines the amount of data used for reduced-error pruning. One fold is used for pruning, the rest for growing the rules.

reducedErrorPruning -- Whether reduced-error pruning is used instead of C.4.5 pruning.

seed -- The seed used for randomizing the data when reduced-error pruning is used.

unpruned -- Whether pruning is performed.

33 of 35

NAMEweka.classifiers.rules.Prism

SYNOPSISClass for building and using a PRISM rule set for classification. Can only deal with nominal attributes. Can't deal with missing values. Doesn't do any pruning. For more information, see

J. Cendrowska (1987). "PRISM: An algorithm for inducing modular rules". International Journal of Man-Machine Studies. Vol.27, No.4, pp.349-370.


NAMEweka.classifiers.rules.Ridor

SYNOPSISThe implementation of a RIpple-DOwn Rule learner. It generates a default rule first and then the exceptions for the default rule with the least (weighted) error rate. Then it generates the "best" exceptions for each exception and iterates until pure. Thus it performs a tree-like expansion of exceptions.The exceptions are a set of rules that predict classes other than the default. IREP is used to generate the exceptions.



majorityClass -- Whether the majority class is used as default.



shuffle -- Determines how often the data is shuffled before a rule is chosen. If > 1, a rule is learned multiple times and the most accurate rule is chosen.

wholeDataErr -- Whether worth of rule is computed based on all the data or just based on data covered by rule.

NAMEweka.classifiers.rules.ZeroR

SYNOPSISClass for building and using a 0-R classifier. Predicts the mean (for a numeric class) or the mode (for a nominal class).


34 of 35

35 of 35

Documents

NAME - BioQUEST · Web viewClass for generating a decision tree with naive Bayes classifiers at the leaves. For more information, see Ron Kohavi (1996). Scaling up the accuracy of