Introduction to Artificial Intelligence Data Mining with ... · FaceID); voice assistants (Bixby, Google Assistant, Alexa, Siri), creating accurate and rich profiles of owners (mobile

Introduction to Artificial Intelligence

Introduction to Artificial Intelligence Data Mining with Clustering Algorithms

Miłosz Kadziński Institute of Computing Science

Poznan University of Technology, Poland

www.cs.put.poznan.pl/mkadzinski/iai

Artificial Intelligence Introduction to Artificial Intelligence

A Few Words About Me

Miłosz Kadzińskie-mail: [email protected]• please use [IAI] in the e-mail’s subjectph.: +48 61 665 3022room: 1.6.6 (Technical Library, BT; 1st floor)consultation hours: Wed 9:45 – 11:15slides: www.cs.put.poznan.pl/mkadzinski/iai

2003 – Adam Mickiewicz High School in Poznań (VIII LO)2008 – M.Sc. in Computer Science2012 – Ph.D. in Intelligent Decision Support Systems2017 – Habilitation in Computer-aided Decision Support

Research specialization – Multiple Criteria Decision AnalysisResearch specialization – Multiple Criteria Decision AnalysisOver 40 international and Polish research awards Main author and (informal) supervisor of the BSc Program in AI


Defining Artificial Intelligence (1)

Defining AI

Activity devoted to making machines intelligent, and intelligence is that quality that enables an entity to function appropriately

and with foresight in its environmentNils J. Nilsson, Cambridge, 2010

A science and a set of computational technologies that are inspired by

– but typically operate quire differently from –the way people use their nervous systems and bodies to sense, learn, reason and take action

P. Stone et al., Stanford. 2016

Nils J. Nilsson, Cambridge, 2010


Defining Artificial Intelligence (2)Defining AI

Characterizing AI depends on the credit one is willing to give software and hardware for ”functioning appropriately” and ”with foresight” The differences in scale, speed, degree of autonomy, generality, …The differences in scale, speed, degree of autonomy, generality, …

electronic calculator(speed, no mistakes)

Deep Blue (1997; chess match against Garry Kasparov)(brute force methods, no single use of ”intelligence”)

The frontier of AI is moving far ahead (calculator vs. smartphone)AI suffers from losing claim to its acquisitions (pattern: new technologies, AI suffers from losing claim to its acquisitions (pattern: new technologies, people getting accustomed to them, stop being considered AI)


Artificial Intelligence: Main Application AreasDefining AI

Intelligence is a complex phenomenonFrightening, futurist vision of AI dominating films and novels are fictional (superhuman robots)Abuse of AI technologies must be acknowledgedand novels are fictional (superhuman robots)

…, more importantly, AI is changing our lives

AI

AI is improving human wealth, safety, and productivity

Transportation

Major research universities devote department to AI studiesApple, Facebook, Google, IBM, and Microsoft explore AI applications

Healthcare

Education

Home/service robots

Public safety

Etertainment


Artificial Intelligence in Transportation and Logistics

Defining AI

Smart cars (GPS; almost 100 sensors responsible for lane changing, self-parking, detecting objects in blind spots, pre-collision systems, …)

AI in transportation

self-parking, detecting objects in blind spots, pre-collision systems, …)

Self-driving cars: Google, Tesla (automatic

perception, planning)

On-demand transportation: Uber or Lyft

matching drivers/passengers

Self-driving delivery vehicles:

Amazon drones

Carpooling/ridesharing: Zimride and Nuride

bring people for a joint trip

Transportation planning (bus/subway schedule, tracking traffic conditions (speed limits, smart pricing, traffic light), routing trips, predictions about traffic conditions)


Artificial Intelligence in Healthcare and Medicine

Defining AI

AI in healthcare

Clinical decision support: mine outcomes from millions of patient clinical records to enable more personalized diagnosis and treatment, automated image interpretation

Mining social media: infer possible health risks,

predicts patients at risk

Devices/treatments: da Vinci or Computer Motion,

millions surgeries a year; better hearing aids

and visual assistive devices

Patient monitoring and coaching: LifeGraph (behavioral patterns, introduce behavior modifications, alerts from data, identify groups of “people like me”)


Artificial Intelligence in Education and TeachingDefining AI

Teaching robots / tutoring systems / online learning:Ozobot teaches children to code and reason; Duolingo provides foreign language

AI in education

Ozobot teaches children to code and reason; Duolingo provides foreign language training; avatar-based training modules to train military personnel; …

Automated generation of questions:

tests for thousands rather than tens

Coursera and Udacitymake use of AI for grading short-

answers, essay questions and programming assignments

Model common students misconceptions, predict which students are at risk of failure, and provide real-time student feedback


Artificial Intelligence in Public Safety

Defining AI

Predictive policing applications and crime prevention: predicting when and where crimes are more likely to happen

AI in public safety

predicting when and where crimes are more likely to happen and who may commit them (CompStat; NYPD)

Detecting while collar crimes (e.g., credit card

fraud; cybersecurity)

Scanning Twitter and other feeds for certain types of

events

Cameras for surveillance that can detect anomalies pointing to a possible crime


Artificial Intelligence in Everyday LifeDefining AI

Vacuum cleaners: Electrolux, iRobot Roomba; obstacle avoidance, self-charging, dealing with full binds, electrical cords and rug tassels,

AI in home robots and everyday devices

self-charging, dealing with full binds, electrical cords and rug tassels, building a complete 3D world model of a house

System in Module, System on Chip: low cost devices able to support

onboard AI

Interaction with people: speech understanding

and image labeling

Smartphones: better photos; battery management; facial recognition (FaceID); voice assistants (Bixby, Google Assistant, Alexa, Siri), creating accurate and rich profiles of

owners (mobile advertising, target customers, where to build a next store branch)


Artificial Intelligence in Entertainment

Defining AI

AI in entertainment

Hollywood industry uses AI technologies to bring its fantasies to the screen

Software for composing music and recognizing

soundtracks

Creating stage performances

Video games make use of computer vision and AI planning; an alternative existence in a virtual world (Second Life, World of Warcraft)


A Brief History of Artificial Intelligence

Defining AI

20th CENTURYBorn at a 1956 workshop organized by John McCarthyMostly academic area of study, but… promised to deliverTheorem proving, logic-based knowledge representation/reasoning Planning (1970s and 1980s), expert and knowledge-based systemsModel-based approaches (physics-based approaches in robotics)

21st CENTURYStarted to deliver technologies that have a substantial impact on everyday livesSuccess of the data-driven paradigmHuman-aware systems: accounting for the characteristics of users


Main Trends in Artificial Intelligence

large-scale machine learning

P. Stone et al., Artificial Intelligence and Life in 2030. One Hundred Year Study on Artificial Intelligence. Stanford, 2016

natural language processinglarge-scale machine learning(pattern mining from large data) reinforcement learning (experience-driven sequential decision-making)deep learning (neural networks)robotics (training robots to interact with the world)computer vision (machine perception)

natural language processing(text processing, speech recognition,

machine translation) Internet of things (interconnected devices that share/use information)

collaborative systems (autonomous systems that can work

with other systems or humans)neuromorphic computing

crowdsourcing andhuman computation

algorithmic game theory and computational social choice


Introduction in Artificial Intelligence: Our PlanDefining AI

I. Clustering (Data mining): K-means, Hierarchical clustering

TODAY

Introduction to AI(your course)

K-means, Hierarchical clustering

II. Classification(Natural Language Process.):

K-NN, Naïve BayesVII. Search algorithms (A*)

III. Classification(Machine Learning):

Decision Trees, ID3, C4VI. Neural networks:

linear and convolutional

IV. Evolutionary algorithms(Optimization)

V. Multi-criteria choice methods(Decision analysis): ELECTRE I

VIII. Assessment test(small problems to solveand a few test questions)

TODAY


Clustering in Data MiningWhat is Clustering in Data Mining?

Clustering is a process of partitioning a set of data (or objects) in a set of meaningful sub-classes, called clusters

Cluster is collection of data objects:similar to one another (hence, can

Data mining: process of discovering patterns in data sets;extract information from a data set and transform it

to a comprehensive structure for further use

similar to one another (hence, canbe treated collectively as one group)as a collection, they are sufficientlydifferent from other groupsintra- vs. inter-cluster similarityClustering can be seen as unsupervised classification

(no pre-defined classes)

15


Why Do We Need Clustering?

Data reduction:General aim:

help users understand

Clustering

Prediction based on groups: cluster and find characteristic

patterns for each group(similar access paterns)

Data reduction:summarization preprocessing for regression, classification;

compression: image processingFinding nearest neighbors:localizing search to one or a small number of clusters

Outlier detection:outliers are often viewed as

those ”away” from any cluster

help users understand the natural structure in a data set


Historic Application of Clustering

Historic application of clustering

John Snow, a London physician plotted the location of cholera deaths on a map during an outbreak in the 1850sThe locations indicated that cases were clustered around certain intersections where they were polluted wells –thus expoising both the problem and the solution

Earth-quake studies: observed earth quake epicenters should be clusters along continent faultsCriminal investigation: crime detection and prevention


Clustered Searched Results

Hierarchical Clustering:: example – clustered search results

resu

lts fo

r ”m

ilosz

kadz

insk

i” wi

th C

arro

t2

Clustering search engines: Grouper, Carrot2, Vivisimo, SnakeT, YippyPerform clustering and labeling on the results of a search engineHelp users to find a quick overview of the search results

resu

lts fo

r ”m

ilosz

kadz

insk

i” wi

th C

arro

t2

18

Help users to find a quick overview of the search results


Clustering Based on Ratings: movielens

Clustering and Collaborative Filtering:: clustering based on ratings: movielens

”MovieLens helps you find movies you will like. Rate movies to build a custom taste profile,

then MovieLens recommends other movies for you to watch.”

Non-commercial, personalized movie recommendations

Groups of users named after animalsGroups of users named after animals

19


Popular Clustering Applications

Clustering

Clustering genes on microarray data: similar

expression patterns implycoexpression of genes

Areas of similar land use in an

earth observation database

Groups of motor insurance policy

holders with a high average claim cost

City planning: identifying groups of houses according

to their type, value and geographical locations

Marketing: distinct groups in customer bases (develop target

marketing programs)

Sales segmentation: what types of customers

buy what products

Similar brands or products: identify competitors, potential

market opportunities and available niches


Clustering Task: Basic Steps

Basic Steps to Develop a Clustering TaskFeature selection / data preprocessing

Select info concerning the task of interestMay need to normalize/standardize data

Distance / similarity measureSimilarity of two feature vectors

Clustering criterionCost function or some rule

Clustering algorithmsChoice of algorithm(s)

Validation of the results

Interpretation of the resultswith applications

21


Representation of Objects/Items (1)

D1 D2 D3 D4 D5 U1 0 3 2 0 2 U2 2 1 0 1 0 U3 0 3 0 0 2

User-pageview transaction matrix

D1 D2 D3 D4 D5 U1 0 1 1 0 1 U2 1 1 0 1 0 U3 0 1 0 0 1

documents / pages documents / pages

user

s

has the user visited a page in a given session?

duration of a visit / / number of page displays

D1 D2 D3 D4 D5 U1 80 40 20 60 100 U2 2 1 5 3 3

min-max normalization: y = x - min

max - min

D1 D2 D3 D4 D5 U1 3/4 1/4 0 2/4 1 U2 1/4 0 1 2/4 2/4

Need for normalization of data

(2 – 1) / (5 – 1) = 1/4

today’s focus: vectors of numbers

normalization of data for objects


Representation of Objects/Items (2)

D1 D2 D3 D4 D5 U1 0 3 2 0 2 U2 2 1 0 1 0 U3 0 3 0 0 2

User-pageview transaction matrix

D1 D2 D3 D4 D5 U1 0 1 1 0 1 U2 1 1 0 1 0 U3 0 1 0 0 1

documents / pages documents / pages

user

s

has the user visited a page in a given session?

duration of a visit / / number of page displays

D1 D2 D3 D4 D5 U1 80 40 20 60 100 U2 2 1 5 3 3 U3 41 2 15 59 90

min-max normalization: y = x - min

max - min

D1 D2 D3 D4 D5 U1 1 1 1 1 1 U2 0 0 0 0 0 U3 1/2 1/39 2/3 56/57 87/97

Need for normalization of data

(41 – 2) / (80 – 2) = 1/2

today’s focus: vectors of numbers

normalization of data for features


Popular Distance Metrics for Clustering

Popular similarity measures for clustering

Feature vectors: X = < x1, x2, …, xn > Y = < y1, y2, …, yn >

D1 D2 D3 D4 D5

Euclidean distance(x,y) = √(x1 – y1)2 + …. + (xn – yn)2

Manhattan distance(x,y) = |x1 – y1| + …. + |xn – yn|

Chebyshev distance(x,y) = maxi=1,…,n |xi – yi|

D1 D2 D3 D4 D5U1 0 3 2 0 2U6 2 0 1 1 2

ED(U1,U6) = √(0-2)2 + (3-0)2 + (2-1)2 + (0-1)2 + (2-2)2 = √15 = 3.873

MD(U1,U6) = |0-2| + |3-0| + |2-1| + |0-1| + |2-2| = 7

CD(U1,U6) = max{|0-2|, |3-0|, |2-1|, |0-1|, |2-2|} = 3CD(U1,U6) = max{|0-2|, |3-0|, |2-1|, |0-1|, |2-2|} = 3


Popular Similarity Measures for Clustering

Popular similarity measures for clustering

Feature vectors: X = < x1, x2, …, xn > Y = < y1, y2, …, yn >

D1 D2 D3 D4 D5 |Ux| vector’s length

cosine similarity(x,y) = j xj ·yj

√j xj2 ·√j yj2

simple matching similarity(x,y) = j xj ·yj

D1 D2 D3 D4 D5 |Ux|U1 0 3 2 0 2 4.12U6 2 0 1 1 2 3.16

= j xj ·yj

|x|·|y|

SM(U1,U6) = 0·2 + 3·0 + 2·1 + 0·1 + 2·2 = 6

cos(U1,U6) = 6 / (4.12·3.16) = 0.46

vector’s length

General transformations:distance(x,y) = 1 - similarity(x,y)

1 = ideal; 0 = anti-ideal

distance(x,y) = 1 / similarity(x,y)

General transformations:


What is Good Clustering?Quality: What Is Good Clustering?

A good clustering method will produce high quality clustershigh intra-class similarity: cohesive within clusterslow inter-class similarity: distinctive between clusters

The quality of a clustering method depends on the similarity measure used, its implementation, and its ability to discover some or all of the hidden patterns

Partitioning approach Hierarchical approachPartitioning approachConstructs various partitions and evaluates them by some criterionK-means, K-medoids, CLARANS

Hierarchical approachHierachical decomposition of the set of data (objects) using some criterionDiana, Agnes, BIRCH, CAMELEON

Density-based approachBased on connectivity and density functionsDBSCAN, OPTICS, DenClue

Model-based approachA model is hypothetised for each cluster and the best fit of that model is searchedEM, SOM, COBWEB

More: grid-based (STING, CLIQUE), frequent pattern-based (pCluster), user-guided or constrained-based (COD, constrained clustering)

25


Partitioning ApproachesPartitioning Approaches

The notion of comparing item similarities can be extended to clusters themselves, by focusing on a representative vector for each cluster

cluster representatives can be actual items in the cluster or other “virtual” representatives such as the centroidreduces the number of similarity computations in clusteringclusters are revised successively until a stopping condition is satisfied, or until no more changes to clusters can be made

Reallocation-Based Partitioning MethodsStart with an initial assignment of items to clusters and then move items from cluster to cluster to obtain an improved partitioningMost common algorithm: k-means

satisfied, or until no more changes to clusters can be made

D1 D2 D3 D4 D5U2 2 1 0 1 0U3 0 3 0 0 2U7 1 0 2 2 0

centroid D1 D2 D3 D4 D5C 1 4/3 2/3 1 2/3

(2 + 0 + 1) / 3 = 1

26


K-Means Clustering Method - Example (1)The K-Means Clustering Method

Given the number of desired clusters K:Randomly assign objects to create K nonempty initial partitions (clusters)Compute the centroids of the clusters of the current partitioning (the centroid is the center, i.e., mean point, of the cluster)Assign each object to the cluster with the nearest centroid (reallocation)Repeat the steps (2 and 3) until the assignment does not changeRepeat the steps (2 and 3) until the assignment does not change

D1 D2 D3 D4 D5U1 0 3 2 0 2U2 2 1 0 1 0U3 0 3 0 0 2U4 1 2 0 2 1U5 0 1 3 0 1U6 2 0 1 1 2U7 1 0 2 2 0U8 3 1 0 0 2

Initial (arbitrary) assignment: C1={U4}, C2={U6}, C3={U7}Compute the similarity of each item to each cluster(simple matching (dot product) as the similarity measure):

U1 U2 U3 U4 U5 U6 U7 U8C1 (U4) 8 6 8 10 3 6 5 7C2 (U6) 6 5 4 6 5 10 6 10C3 (U7) 4 4 0 5 6 6 9 3

Allocate each user to the cluster to which it has the highest similarity (shown in red in the above table)

C1={U1, U2, U3, U4}, C2={U6, U8}, C3={U5, U7}

27

C1={U1, U2, U3, U4}, C2={U6, U8}, C3={U5, U7}End of the first iteration



We repeat the process for another reallocation…

D1 D2 D3 D4 D5U1 0 3 2 0 2U2 2 1 0 1 0U3 0 3 0 0 2

… starting from: C1={U1, U2, U3, U4}, C2={U6, U8}, C3={U5, U7}Compute new cluster centroids using the original user-document matrix

D1 D2 D3 D4 D5C1 3/4 9/4 2/4 3/4 5/4U3 0 3 0 0 2

U4 1 2 0 2 1U5 0 1 3 0 1U6 2 0 1 1 2U7 1 0 2 2 0U8 3 1 0 0 2

C1 3/4 9/4 2/4 3/4 5/4C2 5/2 1/2 1/2 1/2 4/2C3 1/2 1/2 5/2 2/2 1/2

Compute a new centroid-user similarity matrix:

Reallocate the itemsto clusters with

the highest similarity:

U1 U2 U3 U4 U5 U6 U7 U8C1 10.25 4.5 9.25 8 5 5.25 3.25 7C2 6.5 6 5.5 6.5 4 10 4.5 12C3 7.5 2.5 2.5 4 8.5 5.5 7.5 3

C1={U1, U3, U4}, C2={U2, U6, U8}, C3={U5, U7}End of the second iteration

28



We repeat the process for another reallocation…

D1 D2 D3 D4 D5U1 0 3 2 0 2U2 2 1 0 1 0U3 0 3 0 0 2

… starting from: C1={U1, U3, U4}, C2={U2, U6, U8}, C3={U5, U7}Compute new cluster centroids using the original user-document matrix

D1 D2 D3 D4 D5C1 1/3 8/3 2/3 2/3 5/3U3 0 3 0 0 2

U4 1 2 0 2 1U5 0 1 3 0 1U6 2 0 1 1 2U7 1 0 2 2 0U8 3 1 0 0 2

C1 1/3 8/3 2/3 2/3 5/3C2 7/3 2/3 1/3 2/3 4/3C3 1/2 1/2 5/2 2/2 1/2

Compute a new centroid-doc similarity matrix:

Reallocate the itemsto clusters with the

highest similarity:

U1 U2 U3 U4 U5 U6 U7 U8C1 12.67 4 11.33 8.67 6.33 5.33 3 7C2 5.33 6 4.67 6.33 3 8.33 4.33 10.33C3 7.5 2.5 2.5 4 8.5 5.5 7.5 3

C1={U1, U3, U4}, C2={U2, U6, U8}, C3={U5, U7}No change to the clusters terminate the algorithm

29


K-Means Clustering Method

C1

C2

Y

C1

C2

YPick initial cluster centers Assign each point to

the closest cluster center

Y

C1

C2

C

C1

C3

C3 X C3 X

k1

kk3

YMove each cluster center to the meanof each cluster

Reassign points closest to a different new cluster center

XC2 C3

k2

X


K-Means Clustering Method - Summary

Applicable only when mean is definedNeed to specify k X-means

Weakness

Simple, understandable Relatively efficient; complexity

Strength

Need to specify k X-means Results can vary vastly

depending on the seeds Unable to handle noisy data

or outliers

dependant on t·k·n, where n – no. of objectcs, k – no. of clusters, and t – no. of iterations

Often terminates at a local optimum

Restart with different random seeds(increase chance of finding global optimumum)

K-medoids – instead of mean, use medians of each cluster

mean of 1, 3, 5, 7, 9 is 5mean of 1, 3, 5, 7, 1009 is 205median of 1, 3, 5, 7, 1009 is 5

median: not affected Variations of k-means differ in:

Selection of the initial k means

J = j=1,…,K xCj sim(xj, mj)

median: not affected by extreme values

Selection of the initial k meansDistance or similarity measures usedStrategies to calculate cluster means


Hierarchical Clustering ApproachesHierarchical Clustering Algorithms

Two main types of hierarchical clustering

Agglomerative Start with the points as individual clustersAt each step, merge the closest pair of clusters until a stopping

DivisiveStart with one, all-inclusive clusterAt each step, split a cluster untila stopping criterion is met (e.g., each cluster contains a point)pair of clusters until a stopping

criterion (e.g., one cluster left)each cluster contains a point)

Traditional hierarchical algorithms use a similarity or distance matrix

Merge or split one cluster at a timeA

B

C

D

E

AB

CD

CDE

ABCDE

Step 0 Step 1 Step 2 Step 3 Step 4

Agglomerative

E

Step 4 Step 3 Step 2 Step 1 Step 0

Divisive


Hierarchical Agglomerative Clustering

Hierarchical Agglomerative Clustering

Basic procedurePlace each of N items into a cluster of its ownCompute all pairwise item-item similarity coefficientsForm a new cluster by combining the most similar pair of current clusters Ci and Cj

Update similarity matrix by deleting rows/columns corresponding to Ci and Cj

Calculate the entries in the row corresponding to the new cluster Ci+j

Methods for computing similarity between clusters:

single-linkcomplete linkgroup averagecentroid method

Repeat step 3 (forming a new cluster) until a stopping criterion is met

F

C AD

B1

2

54

33

nested clusters E

1

3


HAC - Distance Between Two Clusters (1)Distance between two clusters

dist(Ci,Cj) = minx.y {dist(x,y} : x∈Ci, y∈Cj}

Single-link distance between clusters Ci and Cj is the minimum distance between any object in Ci and any object in Cj The distance is defined by the two closest objects (data points):

sim(Ci,Cj) = maxx.y {sim(x,y} : x∈Ci, y∈Cj}

Single-link similarity between clusters Ci and Cj is the maximum similarity between any object in Ci and any object in Cj The similarity defined by the two most similar objects:

It can find arbitrarily shaped clusters, but may cause the undesirable “chain effect” due to noisy points


HAC - Example Incorporating Single-Link SimilarityDistance between two clusters

U1 U2 U3 U4 U5U1 1 0.9 0.1 0.65 0.2U2 0.9 1 0.7 0.6 0.5U3 0.1 0.7 1 0.4 0.3

Similarity matrix

U12 U3 U4 U5U12 1 0.7 0.65 0.5

U3 0.7 1 0.4 0.3

sim(U12,U3) = max{sim(U1,U3), sim(U2,U3)} = max{0.1, 0.7} = 0.7

U12 U3 U45U12 1 0.7 0.65U3 0.1 0.7 1 0.4 0.3

U4 0.65 0.6 0.4 1 0.8U5 0.2 0.5 0.3 0.8 1

U3 0.7 1 0.4 0.3U4 0.65 0.4 1 0.8U5 0.5 0.3 0.8 1

U12 1 0.7 0.65U3 0.7 1 0.4

U45 0.65 0.4 1

U123 U45U123 1 0.65

U45 0.65 1

U12345U12345 1

sim(U12,U45) = max{sim(U12,U3), sim(U12,U4)} = max{0.65, 0.5} = 0.65

Dendrogram

0.90.8

0.7

0.65

similarity

Possible stopping criteria:number of clusterssimilarity thresholds(do not combine clusterswhich are not similar)

U1 U2 U3 U4 U5

U12345 10.9

similarity


HAC - Distance Between Two Clusters (2)

Distance between two clusters

dist(Ci,Cj) = maxx.y {dist(x,y} : x∈Ci, y∈Cj}

Complete-link distance between clusters Ci and Cj is the maximum distance between any object in Ci and any object in Cj The distance is defined by the two furthest objects (data points):

sim(Ci,Cj) = minx.y {sim(x,y} : x∈Ci, y∈Cj}

Complete-link similarity between clusters Ci and Cj is the minimumsimilarity between any object in Ci and any object in Cj The similarity defined by the two least similar objects:

It is sensitive to outliers because they are far away from each other


HAC - Example Incorporating Complete-Link SimilarityDistance between two clusters

U1 U2 U3 U4 U5 U1 1 0.9 0.1 0.65 0.2 U2 0.9 1 0.7 0.6 0.5 U3 0.1 0.7 1 0.4 0.3 U4 0.65 0.6 0.4 1 0.8 U5 0.2 0.5 0.3 0.8 1

Similarity matrix

U12 U3 U4 U5 U12 1 0.1 0.6 0.2 U3 0.1 1 0.4 0.3 U4 0.6 0.4 1 0.8 U5 0.2 0.3 0.8 1

sim(U12,U3) = min{sim(U1,U3), sim(U2,U3)} = min{0.1, 0.7} = 0.1

U12 U3 U45 U12 1 0.1 0.2 U3 0.1 1 0.3

U45 0.2 0.3 1

U12 U345 U12 1 0.1

U345 0.1 1

U12345 U12345 1

sim(U12,U45) = min{sim(U12,U3), sim(U12,U4)} = min{0.6, 0.2} = 0.2

Dendrogram

sim

ilari

ty

U1 U2 U3 U4 U5

0.9 0.8

0.3

0.1


HAC - Distance Between Two Clusters (3)Distance between two clusters

dist(Ci,Cj) = averagex.y {dist(x,y} : x∈Ci, y∈Cj}

Average-link distance between clusters Ci and Cj is the average distance of all pair-wise distances between the data points in two clusters

Centroid method: the distance between two clusters is the distance between their centroids

A compromise between:the sensitivity of complete-link clustering to outliersthe tendency of single-link clustering to form long chains that do not correspond to the intuitive notion of clusters as compact, spherical objects


Summary (1)

How similar are two strings?

U1 U2 U3 U4 U5 D1 0.2 0.5 0.7 0.7 0.8 D2 0.7 0.2 0.6 0.3 0.6

U1 U2 U3 U4 U5 U1 1 0.61 0.83 0.63 0.79 U2 0.61 1 0.94 1 0.96 U3 0.83 0.94 1 0.95 0.99 U4 0.63 1 0.95 1 0.97 U5 0.79 0.96 0.99 0.97 1

Given the representation of five users and the cosine similarity matrix, use 2-means to group these docs into two clusters:

Assume the least similar users are the initial centroids. Which users would be used?

I)

What would be the clustering obtained the first iteration? Groups G1 and G2?

II) Would it differ in case U3 and U4 were used as the centroids?

Compute the new centroid after the first iteration (for the case of starting with U1 and U2 as the initial centroids).

IV) C1 C2 D1 0.2 ? D2 0.7 ?

Compute the J measure after the first iteration for the above data? III) U1 U2 U3 U4 U5 U6 U7 U8

C1 8 6 8 10 3 6 5 7 C2 6 5 4 6 5 10 6 10 C3 4 4 0 5 6 6 9 3

Hint: for our lecture example: J = (8 + 6 + 8 + 10) + + (10 + 10) + + (6 + 9) = 67


Summary (2)

How similar are two strings?

U1 U2 U3 U4 U5 U1 1 0.61 0.83 0.63 0.79 U2 0.61 1 0.94 1 0.96 U3 0.83 0.94 1 0.95 0.99 U4 0.63 1 0.95 1 0.97 U5 0.79 0.96 0.99 0.97 1

Given the cosine similarity matrix for five users, use agglomerative hierarchical clustering (AHC) to group these users:

Which users would be clustered together first (irrespective of how the similarity between groups is defined)?

I)

Compute the similarity matrix after the first iteration while assuming that the similarity between groups is equal to the maximal/minimal/ average similarity of the users contained in these clusters?

II)

Present the process of AHC by means of a dendrogram. III)

U1 U24 U3 U5 U1 1 ? 0.83 0.79

U24 ? 1 ? ? U3 0.83 ? 1 0.99 U5 0.79 ? 0.99 1

How many groups would be obtained if the similarity threshold for AHC would be set to 0.8?

IV)

sim(U1,U24) = max/min/ave{sim(U1,U2),sim(U1,U4)} = max/min/ave{0.63,0.61}


Documents

Introduction to Artificial Intelligence Data Mining with ... · FaceID); voice assistants (Bixby, Google Assistant, Alexa, Siri), creating accurate and rich profiles of owners (mobile