Upload
mohammed-kharma
View
296
Download
4
Tags:
Embed Size (px)
DESCRIPTION
Data Mining: Implementation of Data Mining Techniques using RapidMiner software presentation
Citation preview
Data Mining: Implementation of Data Mining Techniques using
RapidMiner softwarePrepared by
Mohammed Kharma
Definitions review
• Cluster: A collection of data objects– similar (or related) to one another within the
same group– dissimilar (or unrelated) to the objects in other
groups• Cluster analysis– Finding similarities between data according to the
characteristics found in the data and grouping similar data objects into clusters
Clustering Methods
• Partitioning : – Unsupervised learning algorithms, Construct various
partitions and then evaluate them by some criterion, e.g., minimizing the sum of square errors
– Typical methods: k-means, k-medoids• Hierarchical : – Create a hierarchical decomposition of the set of
data (or objects) using some criterion– Typical methods: Diana, Agnes, BIRCH, ROCK,
CAMELEON
Illustration & compression of 2 clustering technique using Rapidminer tool and Java
application
illustrate of 2 clustering technique using Rapidminer tool and Java
• K-means algorithm: We performed two test
1. Using java program: program parameters K = 2;Data: 22 2123 2024 2225 33 2
6
K-means Clustering• Input: the number of clusters K and the collection of n
instances• Output: a set of k clusters that minimizes the squared error
criterion• Method:– Arbitrarily choose k instances as the initial cluster centers– Repeat• (Re)assign each instance to the cluster to which the
instance is the most similar, based on the mean value of the instances in the cluster• Update cluster means (compute mean value of the
instances for each cluster)– Until no change in the assignment
• Squared Error Criterion– E = ∑i=1 k ∑ pЄCi |p-mi|2 – where mi are the cluster means and p are points in clusters
The result K-Means-java program
The result of K-Means-RapidMiner
The result of K-Means-RapidMiner
Continued-The result of K-Means-RapidMiner
11
K-medoids• Input: the number of clusters K and the collection of n
instances• Output: A set of k clusters that minimizes the sum of the
dissimilarities of all the instances to their nearest medoids• Method:– Arbitrarily choose k instances as the initial medoids– Repeat• (Re)assign each remaining instance to the cluster with
the nearest medoid• Randomly select a non-medoid instance, or• Compute the total cost, S, of swapping Oj with Or• If S<0 then swap Oj with Or to form the new set of k
medoids– Until no change
The result of k-medoids-RapidMiner
The result of k-medoids-RapidMiner
Java Live Demo:http://home.dei.polimi.it/matteucc/Clustering/tutorial_html/AppletKM.html
Comparison
The results of both algorithms are the sameBoth require K to be specified in the
inputK-medoids is less influenced by outliers in the
dataBoth methods assign each instance exactly to
one cluster
»Thank you