Tópicos Especiais em Aprendizagem Reinaldo Bianchi Centro Universitário da FEI 2012
Preview:
Citation preview
- Slide 1
- Tpicos Especiais em Aprendizagem Reinaldo Bianchi Centro
Universitrio da FEI 2012
- Slide 2
- 4a. Aula Parte B
- Slide 3
- O algoritmo K-means
- Slide 4
- K-Means n Algoritmo muito conhecido para agrupamento
(clustering) de padres. n Usado quando se pode definir o nmero de
agrupamentos: Escolha o nmero de agrupamentos desejado. Escolha
centros e membros dos agrupamentos de modo a minimizar o erro. No
pode ser feito por busca: muitos parmetros.
- Slide 5
- K-Means n Algoritmo: Fixe os centros dos agrupamentos. Aloque
os pontos para o agrupamento mais prximo. Recalcule os centros dos
clusters, como sendo a mdia dos pontos que ele representa. Repita
at que os centros parem de se mover.
- Slide 6
- K-Means n Pode ser usado para qualquer atributo para o qual se
pode calcular uma distncia
- Slide 7
- Clustering n Partitioning Clustering Approach: a typical
clustering analysis approach via partitioning data set iteratively
construct a partition of a data set to produce several non-empty
clusters (usually, the number of clusters given in advance) in
principle, partitions achieved via minimising the sum of squared
distance in each cluster
- Slide 8
- Clustering n Given a K, find a partition of K clusters to
optimise the chosen partitioning criterion: global optimal:
exhaustively enumerate all partitions Heuristic method: K-means
algorithm (MacQueen67): each cluster is represented by the center
of the cluster and the algorithm converges to stable centers of
clusters.
- Slide 9
- Algorithm n Initialisation: set seed points n Assign each
object to the cluster with the nearest seed point; n Compute seed
points as the centroids of the clusters of the current partition
(the centroid is the centre, i.e., mean point, of the cluster) n Go
back to Step 1), n stop when no more new assignment Given the
cluster number K, the K-means algorithm is carried out in three
steps:
- Slide 10
- Example n Suppose we have 4 types of medicines and each has two
attributes: pH and weight index. n Our goal is to group these
objects into K=2 group of medicine.
- Slide 11
- Example AB C D MedicineWeightpH-Index A11 B21 C43 D54
- Slide 12
- Step 1: Use initial seed points for partitioning Assign each
object to the cluster with the nearest seed point Euclidean
distance
- Slide 13
- Step 2: Compute new centroids of the current partition Knowing
the members of each cluster, now we compute the new centroid of
each group based on these new memberships.
- Slide 14
- Step 2: Renew membership based on new centroids 14 Compute the
distance of all objects to the new centroids Assign the membership
to objects
- Slide 15
- Step 3: Repeat the first two steps until its convergence
Knowing the members of each cluster, now we compute the new
centroid of each group based on these new memberships.
- Slide 16
- Repeat the first two steps until its convergence Compute the
distance of all objects to the new centroids Stop due to no new
assignment
- Slide 17
- K-means Demo 17 1.User set up the number of clusters theyd
like. (e.g. k=5)
- Slide 18
- K-means Demo 18 1.User set up the number of clusters theyd
like. (e.g. K=5) 2.Randomly guess K cluster Center locations
- Slide 19
- K-means Demo 19 1.User set up the number of clusters theyd
like. (e.g. K=5) 2.Randomly guess K cluster Center locations 3.Each
data point finds out which Center its closest to. (Thus each Center
owns a set of data points)
- Slide 20
- K-means Demo 20 1.User set up the number of clusters theyd
like. (e.g. K=5) 2.Randomly guess K cluster centre locations 3.Each
data point finds out which centre its closest to. (Thus each Center
owns a set of data points) 4.Each centre finds the centroid of the
points it owns
- Slide 21
- K-means Demo 21 1.User set up the number of clusters theyd
like. (e.g. K=5) 2.Randomly guess K cluster centre locations 3.Each
data point finds out which centre its closest to. (Thus each centre
owns a set of data points) 4.Each centre finds the centroid of the
points it owns 5.and jumps there
- Slide 22
- K-means Demo 22 1.User set up the number of clusters theyd
like. (e.g. K=5) 2.Randomly guess K cluster centre locations 3.Each
data point finds out which centre its closest to. (Thus each centre
owns a set of data points) 4.Each centre finds the centroid of the
points it owns 5.and jumps there 6.Repeat until terminated!
- Slide 23
- Exemplo K-means no Matlab 23
- Slide 24
- Exemplo k-means no iPad 24
- Slide 25
- Relevant Issues n Efficient in computation O(tKn), where n is
number of objects, K is number of clusters, and t is number of
iterations. Normally, K, t