Upload
others
View
15
Download
0
Embed Size (px)
Citation preview
Introduction to Artificial Intelligence
Introduction to Artificial Intelligence Data Mining with Clustering Algorithms
Miłosz Kadziński Institute of Computing Science
Poznan University of Technology, Poland
www.cs.put.poznan.pl/mkadzinski/iai
Artificial Intelligence Introduction to Artificial Intelligence
A Few Words About Me
Miłosz Kadzińskie-mail: [email protected]• please use [IAI] in the e-mail’s subjectph.: +48 61 665 3022room: 1.6.6 (Technical Library, BT; 1st floor)consultation hours: Wed 9:45 – 11:15slides: www.cs.put.poznan.pl/mkadzinski/iai
2003 – Adam Mickiewicz High School in Poznań (VIII LO)2008 – M.Sc. in Computer Science2012 – Ph.D. in Intelligent Decision Support Systems2017 – Habilitation in Computer-aided Decision Support
Research specialization – Multiple Criteria Decision AnalysisResearch specialization – Multiple Criteria Decision AnalysisOver 40 international and Polish research awards Main author and (informal) supervisor of the BSc Program in AI
Artificial Intelligence Introduction to Artificial Intelligence
Defining Artificial Intelligence (1)
Defining AI
Activity devoted to making machines intelligent, and intelligence is that quality that enables an entity to function appropriately
and with foresight in its environmentNils J. Nilsson, Cambridge, 2010
A science and a set of computational technologies that are inspired by
– but typically operate quire differently from –the way people use their nervous systems and bodies to sense, learn, reason and take action
P. Stone et al., Stanford. 2016
Nils J. Nilsson, Cambridge, 2010
Artificial Intelligence Introduction to Artificial Intelligence
Defining Artificial Intelligence (2)Defining AI
Characterizing AI depends on the credit one is willing to give software and hardware for ”functioning appropriately” and ”with foresight” The differences in scale, speed, degree of autonomy, generality, …The differences in scale, speed, degree of autonomy, generality, …
electronic calculator(speed, no mistakes)
Deep Blue (1997; chess match against Garry Kasparov)(brute force methods, no single use of ”intelligence”)
The frontier of AI is moving far ahead (calculator vs. smartphone)AI suffers from losing claim to its acquisitions (pattern: new technologies, AI suffers from losing claim to its acquisitions (pattern: new technologies, people getting accustomed to them, stop being considered AI)
Artificial Intelligence Introduction to Artificial Intelligence
Artificial Intelligence: Main Application AreasDefining AI
Intelligence is a complex phenomenonFrightening, futurist vision of AI dominating films and novels are fictional (superhuman robots)Abuse of AI technologies must be acknowledgedand novels are fictional (superhuman robots)
…, more importantly, AI is changing our lives
AI
AI is improving human wealth, safety, and productivity
Transportation
Major research universities devote department to AI studiesApple, Facebook, Google, IBM, and Microsoft explore AI applications
Healthcare
Education
Home/service robots
Public safety
Etertainment
Artificial Intelligence Introduction to Artificial Intelligence
Artificial Intelligence in Transportation and Logistics
Defining AI
Smart cars (GPS; almost 100 sensors responsible for lane changing, self-parking, detecting objects in blind spots, pre-collision systems, …)
AI in transportation
self-parking, detecting objects in blind spots, pre-collision systems, …)
Self-driving cars: Google, Tesla (automatic
perception, planning)
On-demand transportation: Uber or Lyft
matching drivers/passengers
Self-driving delivery vehicles:
Amazon drones
Carpooling/ridesharing: Zimride and Nuride
bring people for a joint trip
Transportation planning (bus/subway schedule, tracking traffic conditions (speed limits, smart pricing, traffic light), routing trips, predictions about traffic conditions)
Artificial Intelligence Introduction to Artificial Intelligence
Artificial Intelligence in Healthcare and Medicine
Defining AI
AI in healthcare
Clinical decision support: mine outcomes from millions of patient clinical records to enable more personalized diagnosis and treatment, automated image interpretation
Mining social media: infer possible health risks,
predicts patients at risk
Devices/treatments: da Vinci or Computer Motion,
millions surgeries a year; better hearing aids
and visual assistive devices
Patient monitoring and coaching: LifeGraph (behavioral patterns, introduce behavior modifications, alerts from data, identify groups of “people like me”)
Artificial Intelligence Introduction to Artificial Intelligence
Artificial Intelligence in Education and TeachingDefining AI
Teaching robots / tutoring systems / online learning:Ozobot teaches children to code and reason; Duolingo provides foreign language
AI in education
Ozobot teaches children to code and reason; Duolingo provides foreign language training; avatar-based training modules to train military personnel; …
Automated generation of questions:
tests for thousands rather than tens
Coursera and Udacitymake use of AI for grading short-
answers, essay questions and programming assignments
Model common students misconceptions, predict which students are at risk of failure, and provide real-time student feedback
Artificial Intelligence Introduction to Artificial Intelligence
Artificial Intelligence in Public Safety
Defining AI
Predictive policing applications and crime prevention: predicting when and where crimes are more likely to happen
AI in public safety
predicting when and where crimes are more likely to happen and who may commit them (CompStat; NYPD)
Detecting while collar crimes (e.g., credit card
fraud; cybersecurity)
Scanning Twitter and other feeds for certain types of
events
Cameras for surveillance that can detect anomalies pointing to a possible crime
Artificial Intelligence Introduction to Artificial Intelligence
Artificial Intelligence in Everyday LifeDefining AI
Vacuum cleaners: Electrolux, iRobot Roomba; obstacle avoidance, self-charging, dealing with full binds, electrical cords and rug tassels,
AI in home robots and everyday devices
self-charging, dealing with full binds, electrical cords and rug tassels, building a complete 3D world model of a house
System in Module, System on Chip: low cost devices able to support
onboard AI
Interaction with people: speech understanding
and image labeling
Smartphones: better photos; battery management; facial recognition (FaceID); voice assistants (Bixby, Google Assistant, Alexa, Siri), creating accurate and rich profiles of
owners (mobile advertising, target customers, where to build a next store branch)
Artificial Intelligence Introduction to Artificial Intelligence
Artificial Intelligence in Entertainment
Defining AI
AI in entertainment
Hollywood industry uses AI technologies to bring its fantasies to the screen
Software for composing music and recognizing
soundtracks
Creating stage performances
Video games make use of computer vision and AI planning; an alternative existence in a virtual world (Second Life, World of Warcraft)
Artificial Intelligence Introduction to Artificial Intelligence
A Brief History of Artificial Intelligence
Defining AI
20th CENTURYBorn at a 1956 workshop organized by John McCarthyMostly academic area of study, but… promised to deliverTheorem proving, logic-based knowledge representation/reasoning Planning (1970s and 1980s), expert and knowledge-based systemsModel-based approaches (physics-based approaches in robotics)
21st CENTURYStarted to deliver technologies that have a substantial impact on everyday livesSuccess of the data-driven paradigmHuman-aware systems: accounting for the characteristics of users
Artificial Intelligence Introduction to Artificial Intelligence
Main Trends in Artificial Intelligence
large-scale machine learning
P. Stone et al., Artificial Intelligence and Life in 2030. One Hundred Year Study on Artificial Intelligence. Stanford, 2016
natural language processinglarge-scale machine learning(pattern mining from large data) reinforcement learning (experience-driven sequential decision-making)deep learning (neural networks)robotics (training robots to interact with the world)computer vision (machine perception)
natural language processing(text processing, speech recognition,
machine translation) Internet of things (interconnected devices that share/use information)
collaborative systems (autonomous systems that can work
with other systems or humans)neuromorphic computing
crowdsourcing andhuman computation
algorithmic game theory and computational social choice
Artificial Intelligence Introduction to Artificial Intelligence
Introduction in Artificial Intelligence: Our PlanDefining AI
I. Clustering (Data mining): K-means, Hierarchical clustering
TODAY
Introduction to AI(your course)
K-means, Hierarchical clustering
II. Classification(Natural Language Process.):
K-NN, Naïve BayesVII. Search algorithms (A*)
III. Classification(Machine Learning):
Decision Trees, ID3, C4VI. Neural networks:
linear and convolutional
IV. Evolutionary algorithms(Optimization)
V. Multi-criteria choice methods(Decision analysis): ELECTRE I
VIII. Assessment test(small problems to solveand a few test questions)
TODAY
Artificial Intelligence Introduction to Artificial Intelligence
Clustering in Data MiningWhat is Clustering in Data Mining?
Clustering is a process of partitioning a set of data (or objects) in a set of meaningful sub-classes, called clusters
Cluster is collection of data objects:similar to one another (hence, can
Data mining: process of discovering patterns in data sets;extract information from a data set and transform it
to a comprehensive structure for further use
similar to one another (hence, canbe treated collectively as one group)as a collection, they are sufficientlydifferent from other groupsintra- vs. inter-cluster similarityClustering can be seen as unsupervised classification
(no pre-defined classes)
15
Artificial Intelligence Introduction to Artificial Intelligence
Why Do We Need Clustering?
Data reduction:General aim:
help users understand
Clustering
Prediction based on groups: cluster and find characteristic
patterns for each group(similar access paterns)
Data reduction:summarization preprocessing for regression, classification;
compression: image processingFinding nearest neighbors:localizing search to one or a small number of clusters
Outlier detection:outliers are often viewed as
those ”away” from any cluster
help users understand the natural structure in a data set
Artificial Intelligence Introduction to Artificial Intelligence
Historic Application of Clustering
Historic application of clustering
John Snow, a London physician plotted the location of cholera deaths on a map during an outbreak in the 1850sThe locations indicated that cases were clustered around certain intersections where they were polluted wells –thus expoising both the problem and the solution
Earth-quake studies: observed earth quake epicenters should be clusters along continent faultsCriminal investigation: crime detection and prevention
Artificial Intelligence Introduction to Artificial Intelligence
Clustered Searched Results
Hierarchical Clustering:: example – clustered search results
resu
lts fo
r ”m
ilosz
kadz
insk
i” wi
th C
arro
t2
Clustering search engines: Grouper, Carrot2, Vivisimo, SnakeT, YippyPerform clustering and labeling on the results of a search engineHelp users to find a quick overview of the search results
resu
lts fo
r ”m
ilosz
kadz
insk
i” wi
th C
arro
t2
18
Help users to find a quick overview of the search results
Artificial Intelligence Introduction to Artificial Intelligence
Clustering Based on Ratings: movielens
Clustering and Collaborative Filtering:: clustering based on ratings: movielens
”MovieLens helps you find movies you will like. Rate movies to build a custom taste profile,
then MovieLens recommends other movies for you to watch.”
Non-commercial, personalized movie recommendations
Groups of users named after animalsGroups of users named after animals
19
Artificial Intelligence Introduction to Artificial Intelligence
Popular Clustering Applications
Clustering
Clustering genes on microarray data: similar
expression patterns implycoexpression of genes
Areas of similar land use in an
earth observation database
Groups of motor insurance policy
holders with a high average claim cost
City planning: identifying groups of houses according
to their type, value and geographical locations
Marketing: distinct groups in customer bases (develop target
marketing programs)
Sales segmentation: what types of customers
buy what products
Similar brands or products: identify competitors, potential
market opportunities and available niches
Artificial Intelligence Introduction to Artificial Intelligence
Clustering Task: Basic Steps
Basic Steps to Develop a Clustering TaskFeature selection / data preprocessing
Select info concerning the task of interestMay need to normalize/standardize data
Distance / similarity measureSimilarity of two feature vectors
Clustering criterionCost function or some rule
Clustering algorithmsChoice of algorithm(s)
Validation of the results
Interpretation of the resultswith applications
21
Artificial Intelligence Introduction to Artificial Intelligence
Representation of Objects/Items (1)
D1 D2 D3 D4 D5 U1 0 3 2 0 2 U2 2 1 0 1 0 U3 0 3 0 0 2
User-pageview transaction matrix
D1 D2 D3 D4 D5 U1 0 1 1 0 1 U2 1 1 0 1 0 U3 0 1 0 0 1
documents / pages documents / pages
user
s
has the user visited a page in a given session?
duration of a visit / / number of page displays
D1 D2 D3 D4 D5 U1 80 40 20 60 100 U2 2 1 5 3 3
min-max normalization: y = x - min
max - min
D1 D2 D3 D4 D5 U1 3/4 1/4 0 2/4 1 U2 1/4 0 1 2/4 2/4
Need for normalization of data
(2 – 1) / (5 – 1) = 1/4
today’s focus: vectors of numbers
normalization of data for objects
Artificial Intelligence Introduction to Artificial Intelligence
Representation of Objects/Items (2)
D1 D2 D3 D4 D5 U1 0 3 2 0 2 U2 2 1 0 1 0 U3 0 3 0 0 2
User-pageview transaction matrix
D1 D2 D3 D4 D5 U1 0 1 1 0 1 U2 1 1 0 1 0 U3 0 1 0 0 1
documents / pages documents / pages
user
s
has the user visited a page in a given session?
duration of a visit / / number of page displays
D1 D2 D3 D4 D5 U1 80 40 20 60 100 U2 2 1 5 3 3 U3 41 2 15 59 90
min-max normalization: y = x - min
max - min
D1 D2 D3 D4 D5 U1 1 1 1 1 1 U2 0 0 0 0 0 U3 1/2 1/39 2/3 56/57 87/97
Need for normalization of data
(41 – 2) / (80 – 2) = 1/2
today’s focus: vectors of numbers
normalization of data for features
Artificial Intelligence Introduction to Artificial Intelligence
Popular Distance Metrics for Clustering
Popular similarity measures for clustering
Feature vectors: X = < x1, x2, …, xn > Y = < y1, y2, …, yn >
D1 D2 D3 D4 D5
Euclidean distance(x,y) = √(x1 – y1)2 + …. + (xn – yn)2
Manhattan distance(x,y) = |x1 – y1| + …. + |xn – yn|
Chebyshev distance(x,y) = maxi=1,…,n |xi – yi|
D1 D2 D3 D4 D5U1 0 3 2 0 2U6 2 0 1 1 2
ED(U1,U6) = √(0-2)2 + (3-0)2 + (2-1)2 + (0-1)2 + (2-2)2 = √15 = 3.873
MD(U1,U6) = |0-2| + |3-0| + |2-1| + |0-1| + |2-2| = 7
CD(U1,U6) = max{|0-2|, |3-0|, |2-1|, |0-1|, |2-2|} = 3CD(U1,U6) = max{|0-2|, |3-0|, |2-1|, |0-1|, |2-2|} = 3
Artificial Intelligence Introduction to Artificial Intelligence
Popular Similarity Measures for Clustering
Popular similarity measures for clustering
Feature vectors: X = < x1, x2, …, xn > Y = < y1, y2, …, yn >
D1 D2 D3 D4 D5 |Ux| vector’s length
cosine similarity(x,y) = j xj ·yj
√j xj2 ·√j yj2
simple matching similarity(x,y) = j xj ·yj
D1 D2 D3 D4 D5 |Ux|U1 0 3 2 0 2 4.12U6 2 0 1 1 2 3.16
= j xj ·yj
|x|·|y|
SM(U1,U6) = 0·2 + 3·0 + 2·1 + 0·1 + 2·2 = 6
cos(U1,U6) = 6 / (4.12·3.16) = 0.46
vector’s length
General transformations:distance(x,y) = 1 - similarity(x,y)
1 = ideal; 0 = anti-ideal
distance(x,y) = 1 / similarity(x,y)
General transformations:
Artificial Intelligence Introduction to Artificial Intelligence
What is Good Clustering?Quality: What Is Good Clustering?
A good clustering method will produce high quality clustershigh intra-class similarity: cohesive within clusterslow inter-class similarity: distinctive between clusters
The quality of a clustering method depends on the similarity measure used, its implementation, and its ability to discover some or all of the hidden patterns
Partitioning approach Hierarchical approachPartitioning approachConstructs various partitions and evaluates them by some criterionK-means, K-medoids, CLARANS
Hierarchical approachHierachical decomposition of the set of data (objects) using some criterionDiana, Agnes, BIRCH, CAMELEON
Density-based approachBased on connectivity and density functionsDBSCAN, OPTICS, DenClue
Model-based approachA model is hypothetised for each cluster and the best fit of that model is searchedEM, SOM, COBWEB
More: grid-based (STING, CLIQUE), frequent pattern-based (pCluster), user-guided or constrained-based (COD, constrained clustering)
25
Artificial Intelligence Introduction to Artificial Intelligence
Partitioning ApproachesPartitioning Approaches
The notion of comparing item similarities can be extended to clusters themselves, by focusing on a representative vector for each cluster
cluster representatives can be actual items in the cluster or other “virtual” representatives such as the centroidreduces the number of similarity computations in clusteringclusters are revised successively until a stopping condition is satisfied, or until no more changes to clusters can be made
Reallocation-Based Partitioning MethodsStart with an initial assignment of items to clusters and then move items from cluster to cluster to obtain an improved partitioningMost common algorithm: k-means
satisfied, or until no more changes to clusters can be made
D1 D2 D3 D4 D5U2 2 1 0 1 0U3 0 3 0 0 2U7 1 0 2 2 0
centroid D1 D2 D3 D4 D5C 1 4/3 2/3 1 2/3
(2 + 0 + 1) / 3 = 1
26
Artificial Intelligence Introduction to Artificial Intelligence
K-Means Clustering Method - Example (1)The K-Means Clustering Method
Given the number of desired clusters K:Randomly assign objects to create K nonempty initial partitions (clusters)Compute the centroids of the clusters of the current partitioning (the centroid is the center, i.e., mean point, of the cluster)Assign each object to the cluster with the nearest centroid (reallocation)Repeat the steps (2 and 3) until the assignment does not changeRepeat the steps (2 and 3) until the assignment does not change
D1 D2 D3 D4 D5U1 0 3 2 0 2U2 2 1 0 1 0U3 0 3 0 0 2U4 1 2 0 2 1U5 0 1 3 0 1U6 2 0 1 1 2U7 1 0 2 2 0U8 3 1 0 0 2
Initial (arbitrary) assignment: C1={U4}, C2={U6}, C3={U7}Compute the similarity of each item to each cluster(simple matching (dot product) as the similarity measure):
U1 U2 U3 U4 U5 U6 U7 U8C1 (U4) 8 6 8 10 3 6 5 7C2 (U6) 6 5 4 6 5 10 6 10C3 (U7) 4 4 0 5 6 6 9 3
Allocate each user to the cluster to which it has the highest similarity (shown in red in the above table)
C1={U1, U2, U3, U4}, C2={U6, U8}, C3={U5, U7}
27
C1={U1, U2, U3, U4}, C2={U6, U8}, C3={U5, U7}End of the first iteration
Artificial Intelligence Introduction to Artificial Intelligence
K-Means Clustering Method - Example (2)The K-Means Clustering Method
We repeat the process for another reallocation…
D1 D2 D3 D4 D5U1 0 3 2 0 2U2 2 1 0 1 0U3 0 3 0 0 2
… starting from: C1={U1, U2, U3, U4}, C2={U6, U8}, C3={U5, U7}Compute new cluster centroids using the original user-document matrix
D1 D2 D3 D4 D5C1 3/4 9/4 2/4 3/4 5/4U3 0 3 0 0 2
U4 1 2 0 2 1U5 0 1 3 0 1U6 2 0 1 1 2U7 1 0 2 2 0U8 3 1 0 0 2
C1 3/4 9/4 2/4 3/4 5/4C2 5/2 1/2 1/2 1/2 4/2C3 1/2 1/2 5/2 2/2 1/2
Compute a new centroid-user similarity matrix:
Reallocate the itemsto clusters with
the highest similarity:
U1 U2 U3 U4 U5 U6 U7 U8C1 10.25 4.5 9.25 8 5 5.25 3.25 7C2 6.5 6 5.5 6.5 4 10 4.5 12C3 7.5 2.5 2.5 4 8.5 5.5 7.5 3
C1={U1, U3, U4}, C2={U2, U6, U8}, C3={U5, U7}End of the second iteration
28
Artificial Intelligence Introduction to Artificial Intelligence
K-Means Clustering Method - Example (3)The K-Means Clustering Method
We repeat the process for another reallocation…
D1 D2 D3 D4 D5U1 0 3 2 0 2U2 2 1 0 1 0U3 0 3 0 0 2
… starting from: C1={U1, U3, U4}, C2={U2, U6, U8}, C3={U5, U7}Compute new cluster centroids using the original user-document matrix
D1 D2 D3 D4 D5C1 1/3 8/3 2/3 2/3 5/3U3 0 3 0 0 2
U4 1 2 0 2 1U5 0 1 3 0 1U6 2 0 1 1 2U7 1 0 2 2 0U8 3 1 0 0 2
C1 1/3 8/3 2/3 2/3 5/3C2 7/3 2/3 1/3 2/3 4/3C3 1/2 1/2 5/2 2/2 1/2
Compute a new centroid-doc similarity matrix:
Reallocate the itemsto clusters with the
highest similarity:
U1 U2 U3 U4 U5 U6 U7 U8C1 12.67 4 11.33 8.67 6.33 5.33 3 7C2 5.33 6 4.67 6.33 3 8.33 4.33 10.33C3 7.5 2.5 2.5 4 8.5 5.5 7.5 3
C1={U1, U3, U4}, C2={U2, U6, U8}, C3={U5, U7}No change to the clusters terminate the algorithm
29
Artificial Intelligence Introduction to Artificial Intelligence
K-Means Clustering Method
C1
C2
Y
C1
C2
YPick initial cluster centers Assign each point to
the closest cluster center
Y
C1
C2
C
C1
C3
C3 X C3 X
k1
kk3
YMove each cluster center to the meanof each cluster
Reassign points closest to a different new cluster center
XC2 C3
k2
X
Artificial Intelligence Introduction to Artificial Intelligence
K-Means Clustering Method - Summary
Applicable only when mean is definedNeed to specify k X-means
Weakness
Simple, understandable Relatively efficient; complexity
Strength
Need to specify k X-means Results can vary vastly
depending on the seeds Unable to handle noisy data
or outliers
dependant on t·k·n, where n – no. of objectcs, k – no. of clusters, and t – no. of iterations
Often terminates at a local optimum
Restart with different random seeds(increase chance of finding global optimumum)
K-medoids – instead of mean, use medians of each cluster
mean of 1, 3, 5, 7, 9 is 5mean of 1, 3, 5, 7, 1009 is 205median of 1, 3, 5, 7, 1009 is 5
median: not affected Variations of k-means differ in:
Selection of the initial k means
J = j=1,…,K xCj sim(xj, mj)
median: not affected by extreme values
Selection of the initial k meansDistance or similarity measures usedStrategies to calculate cluster means
Artificial Intelligence Introduction to Artificial Intelligence
Hierarchical Clustering ApproachesHierarchical Clustering Algorithms
Two main types of hierarchical clustering
Agglomerative Start with the points as individual clustersAt each step, merge the closest pair of clusters until a stopping
DivisiveStart with one, all-inclusive clusterAt each step, split a cluster untila stopping criterion is met (e.g., each cluster contains a point)pair of clusters until a stopping
criterion (e.g., one cluster left)each cluster contains a point)
Traditional hierarchical algorithms use a similarity or distance matrix
Merge or split one cluster at a timeA
B
C
D
E
AB
CD
CDE
ABCDE
Step 0 Step 1 Step 2 Step 3 Step 4
Agglomerative
E
Step 4 Step 3 Step 2 Step 1 Step 0
Divisive
Artificial Intelligence Introduction to Artificial Intelligence
Hierarchical Agglomerative Clustering
Hierarchical Agglomerative Clustering
Basic procedurePlace each of N items into a cluster of its ownCompute all pairwise item-item similarity coefficientsForm a new cluster by combining the most similar pair of current clusters Ci and Cj
Update similarity matrix by deleting rows/columns corresponding to Ci and Cj
Calculate the entries in the row corresponding to the new cluster Ci+j
Methods for computing similarity between clusters:
single-linkcomplete linkgroup averagecentroid method
Repeat step 3 (forming a new cluster) until a stopping criterion is met
F
C AD
B1
2
54
33
nested clusters E
1
3
Artificial Intelligence Introduction to Artificial Intelligence
HAC - Distance Between Two Clusters (1)Distance between two clusters
dist(Ci,Cj) = minx.y {dist(x,y} : x∈Ci, y∈Cj}
Single-link distance between clusters Ci and Cj is the minimum distance between any object in Ci and any object in Cj The distance is defined by the two closest objects (data points):
sim(Ci,Cj) = maxx.y {sim(x,y} : x∈Ci, y∈Cj}
Single-link similarity between clusters Ci and Cj is the maximum similarity between any object in Ci and any object in Cj The similarity defined by the two most similar objects:
It can find arbitrarily shaped clusters, but may cause the undesirable “chain effect” due to noisy points
Artificial Intelligence Introduction to Artificial Intelligence
HAC - Example Incorporating Single-Link SimilarityDistance between two clusters
U1 U2 U3 U4 U5U1 1 0.9 0.1 0.65 0.2U2 0.9 1 0.7 0.6 0.5U3 0.1 0.7 1 0.4 0.3
Similarity matrix
U12 U3 U4 U5U12 1 0.7 0.65 0.5
U3 0.7 1 0.4 0.3
sim(U12,U3) = max{sim(U1,U3), sim(U2,U3)} = max{0.1, 0.7} = 0.7
U12 U3 U45U12 1 0.7 0.65U3 0.1 0.7 1 0.4 0.3
U4 0.65 0.6 0.4 1 0.8U5 0.2 0.5 0.3 0.8 1
U3 0.7 1 0.4 0.3U4 0.65 0.4 1 0.8U5 0.5 0.3 0.8 1
U12 1 0.7 0.65U3 0.7 1 0.4
U45 0.65 0.4 1
U123 U45U123 1 0.65
U45 0.65 1
U12345U12345 1
sim(U12,U45) = max{sim(U12,U3), sim(U12,U4)} = max{0.65, 0.5} = 0.65
Dendrogram
0.90.8
0.7
0.65
similarity
Possible stopping criteria:number of clusterssimilarity thresholds(do not combine clusterswhich are not similar)
U1 U2 U3 U4 U5
U12345 10.9
similarity
Artificial Intelligence Introduction to Artificial Intelligence
HAC - Distance Between Two Clusters (2)
Distance between two clusters
dist(Ci,Cj) = maxx.y {dist(x,y} : x∈Ci, y∈Cj}
Complete-link distance between clusters Ci and Cj is the maximum distance between any object in Ci and any object in Cj The distance is defined by the two furthest objects (data points):
sim(Ci,Cj) = minx.y {sim(x,y} : x∈Ci, y∈Cj}
Complete-link similarity between clusters Ci and Cj is the minimumsimilarity between any object in Ci and any object in Cj The similarity defined by the two least similar objects:
It is sensitive to outliers because they are far away from each other
Artificial Intelligence Introduction to Artificial Intelligence
HAC - Example Incorporating Complete-Link SimilarityDistance between two clusters
U1 U2 U3 U4 U5 U1 1 0.9 0.1 0.65 0.2 U2 0.9 1 0.7 0.6 0.5 U3 0.1 0.7 1 0.4 0.3 U4 0.65 0.6 0.4 1 0.8 U5 0.2 0.5 0.3 0.8 1
Similarity matrix
U12 U3 U4 U5 U12 1 0.1 0.6 0.2 U3 0.1 1 0.4 0.3 U4 0.6 0.4 1 0.8 U5 0.2 0.3 0.8 1
sim(U12,U3) = min{sim(U1,U3), sim(U2,U3)} = min{0.1, 0.7} = 0.1
U12 U3 U45 U12 1 0.1 0.2 U3 0.1 1 0.3
U45 0.2 0.3 1
U12 U345 U12 1 0.1
U345 0.1 1
U12345 U12345 1
sim(U12,U45) = min{sim(U12,U3), sim(U12,U4)} = min{0.6, 0.2} = 0.2
Dendrogram
sim
ilari
ty
U1 U2 U3 U4 U5
0.9 0.8
0.3
0.1
Artificial Intelligence Introduction to Artificial Intelligence
HAC - Distance Between Two Clusters (3)Distance between two clusters
dist(Ci,Cj) = averagex.y {dist(x,y} : x∈Ci, y∈Cj}
Average-link distance between clusters Ci and Cj is the average distance of all pair-wise distances between the data points in two clusters
Centroid method: the distance between two clusters is the distance between their centroids
A compromise between:the sensitivity of complete-link clustering to outliersthe tendency of single-link clustering to form long chains that do not correspond to the intuitive notion of clusters as compact, spherical objects
Artificial Intelligence Introduction to Artificial Intelligence
Summary (1)
How similar are two strings?
U1 U2 U3 U4 U5 D1 0.2 0.5 0.7 0.7 0.8 D2 0.7 0.2 0.6 0.3 0.6
U1 U2 U3 U4 U5 U1 1 0.61 0.83 0.63 0.79 U2 0.61 1 0.94 1 0.96 U3 0.83 0.94 1 0.95 0.99 U4 0.63 1 0.95 1 0.97 U5 0.79 0.96 0.99 0.97 1
Given the representation of five users and the cosine similarity matrix, use 2-means to group these docs into two clusters:
Assume the least similar users are the initial centroids. Which users would be used?
I)
What would be the clustering obtained the first iteration? Groups G1 and G2?
II) Would it differ in case U3 and U4 were used as the centroids?
Compute the new centroid after the first iteration (for the case of starting with U1 and U2 as the initial centroids).
IV) C1 C2 D1 0.2 ? D2 0.7 ?
Compute the J measure after the first iteration for the above data? III) U1 U2 U3 U4 U5 U6 U7 U8
C1 8 6 8 10 3 6 5 7 C2 6 5 4 6 5 10 6 10 C3 4 4 0 5 6 6 9 3
Hint: for our lecture example: J = (8 + 6 + 8 + 10) + + (10 + 10) + + (6 + 9) = 67
Artificial Intelligence Introduction to Artificial Intelligence
Summary (2)
How similar are two strings?
U1 U2 U3 U4 U5 U1 1 0.61 0.83 0.63 0.79 U2 0.61 1 0.94 1 0.96 U3 0.83 0.94 1 0.95 0.99 U4 0.63 1 0.95 1 0.97 U5 0.79 0.96 0.99 0.97 1
Given the cosine similarity matrix for five users, use agglomerative hierarchical clustering (AHC) to group these users:
Which users would be clustered together first (irrespective of how the similarity between groups is defined)?
I)
Compute the similarity matrix after the first iteration while assuming that the similarity between groups is equal to the maximal/minimal/ average similarity of the users contained in these clusters?
II)
Present the process of AHC by means of a dendrogram. III)
U1 U24 U3 U5 U1 1 ? 0.83 0.79
U24 ? 1 ? ? U3 0.83 ? 1 0.99 U5 0.79 ? 0.99 1
How many groups would be obtained if the similarity threshold for AHC would be set to 0.8?
IV)
sim(U1,U24) = max/min/ave{sim(U1,U2),sim(U1,U4)} = max/min/ave{0.63,0.61}
Artificial Intelligence Introduction to Artificial Intelligence