18
Incremental Conceptual Clustering Kalpa Gunaratna Reading group discussions @Kno.e.sis Based on Fisher’s Cobweb algorithm

Incremental concpetual clustering - reading group discussion

Embed Size (px)

Citation preview

Page 1: Incremental concpetual clustering - reading group discussion

Incremental Conceptual Clustering

Kalpa Gunaratna

Reading group discussions @Kno.e.sis

Based on Fisher’s Cobweb algorithm

Page 2: Incremental concpetual clustering - reading group discussion

Clustering *

• Clustering is the unsupervised classification of patterns into groups.

* Jain, Anil K., M. Narasimha Murty, and Patrick J. Flynn. "Data clustering: a review." ACM computing surveys (CSUR) 31, no. 3 (1999): 264-323.2

Page 3: Incremental concpetual clustering - reading group discussion

3

Page 4: Incremental concpetual clustering - reading group discussion

Focus on hierarchical clustering

• Single link clusteringThe distance between two clusters is the minimum of the distances between all pairs of patterns drawn from the two clusters.

In other words, evaluates dissimilarity between two clusters as the dissimilarity of the nearest patterns, one from each cluster.

• Complete link clusteringThe distance between two clusters is the maximum of all pairs between the two clusters.

In other words, evaluates dissimilarity between two clusters as the greatest distance between any two patterns, one from each cluster.

• Produces compact clusters.

4

Page 5: Incremental concpetual clustering - reading group discussion

• Single link algorithm can extract concentric clusters as shown below whereas complete link cannot.

5

Page 6: Incremental concpetual clustering - reading group discussion

• But single link algorithm suffers from chaining effect as shown below whereas complete link does not have this effect. Therefore, researchers believe complete link gives more useful clusters in real problems.

6

Page 7: Incremental concpetual clustering - reading group discussion

• Dendrogram

7

Page 8: Incremental concpetual clustering - reading group discussion

Our focus – Incremental Conceptual Clustering (Cobweb) 1, 2

Given a set of observations, humans acquire concepts that organize those observations and use them in classifying future experiences. This type of concept formation can occur in the absence of a tutor and it can take place despite irrelevant and incomplete information.

81. Fisher, Douglas H. "Knowledge acquisition via incremental conceptual clustering." Machine learning 2, no. 2 (1987): 139-172.2. Gennari, John H., Pat Langley, and Doug Fisher. "Models of incremental concept formation." Artificial intelligence 40, no. 1 (1989): 11-61.

Page 9: Incremental concpetual clustering - reading group discussion

• Cobweb• Uses a hill climbing search strategy having operators enabling bi-directional

travel in the space.• Hill climbing is a classic AI search method in which one applies all operator instantiations,

compares the resulting states using an evaluation function, selects the best state, and iterates until no more progress can be made.

• Has a function called Category Utility to decide on what action to take in the hill climbing search. • Computes similarity within clusters and dissimilarity between clusters.

9

Page 10: Incremental concpetual clustering - reading group discussion

• Category utility function

• Intra-class similarity is measured by P(Ai=Vij/Ck). - predictability• The larger this probability, the greater the proportion of class members sharing the value

and the more predictable the value is of class members.

• Inter-class similarity is measured by P(Ck/Ai=Vij). - predictiveness• The larger this probability, the fewer the objects in contrasting classes that share this

value and the more predictive the value is of the class.

10

Page 11: Incremental concpetual clustering - reading group discussion

𝑘

𝑖

𝑗

𝑃 𝐴𝑖 = 𝑉𝑖𝑗 𝑃 𝐶𝑘/𝐴𝑖 = 𝑉𝑖𝑗 𝑃 𝐴𝑖 = 𝑉𝑖𝑗/𝐶𝑘

Using Bayes’ theorem

𝑘

𝑃(𝐶𝑘)

𝑖

𝑗

𝑃 𝐴𝑖 = 𝑉𝑖𝑗/𝐶𝑘2

This is the expected number of attribute values that one can correctly guess for an arbitrary member of class Ck.

11

Page 12: Incremental concpetual clustering - reading group discussion

• They further went on to say that CU as the increase in the expected number of attribute values that can be correctly guessed, given a set of n categories, over the expected number of correct guesses without such knowledge.

• Divided by K so that merging, splitting, or adding nodes is taken care of (will discuss now).

12

Page 13: Incremental concpetual clustering - reading group discussion

• There are four main operators in creating the hierarchy.• Classify into an existing class.

• Create a new class.

• Combine two classes into one (merging).

• Divide a class into several classes (splitting).

• Because of the last two operations, this is normally not sensitive to the order of items to be clustered.

13

Page 14: Incremental concpetual clustering - reading group discussion

• Merging

14

Page 15: Incremental concpetual clustering - reading group discussion

• Splitting

15

Page 16: Incremental concpetual clustering - reading group discussion

16

Page 17: Incremental concpetual clustering - reading group discussion

17

Page 18: Incremental concpetual clustering - reading group discussion

• Positive points about Incremental Conceptual Clustering (as I see)• Unsupervised

• Input order does not matter

• Efficient – does not compute similarity/dissimilarity between all pairs/combinations

• Good for dynamic environments

• Bi-directional search space walk in the hierarchy construction

• Try to mimic human categorization behavior

• Clustering is based on probability – not just a similarity score

18