12

Clustering and PCA Recitation Nupur Chatterji, Kenny Marino, Colin White

Recitation Clustering and PCA Colin White Nupur Chatterji, Kenny …ninamf/courses/401sp18/recitations... · 2018-04-19 · 3 of 42 Applications (Clustering comes up everywhere

Download PDF Report

Upload
others
View
5
Download
0

Embed Size (px)

Citation preview

Page 1: Recitation Clustering and PCA Colin White Nupur Chatterji, Kenny …ninamf/courses/401sp18/recitations... · 2018-04-19 · 3 of 42 Applications (Clustering comes up everywhere

Clustering and PCA Recitation

Nupur Chatterji, Kenny Marino, Colin White

Page 2: Recitation Clustering and PCA Colin White Nupur Chatterji, Kenny …ninamf/courses/401sp18/recitations... · 2018-04-19 · 3 of 42 Applications (Clustering comes up everywhere

ClusteringUnsupervised Learning - unlabeled data

● Automatically organize data● Understand structure in data● Preprocessing for further analysis

Page 3: Recitation Clustering and PCA Colin White Nupur Chatterji, Kenny …ninamf/courses/401sp18/recitations... · 2018-04-19 · 3 of 42 Applications (Clustering comes up everywhere

Page 4: Recitation Clustering and PCA Colin White Nupur Chatterji, Kenny …ninamf/courses/401sp18/recitations... · 2018-04-19 · 3 of 42 Applications (Clustering comes up everywhere

Page 5: Recitation Clustering and PCA Colin White Nupur Chatterji, Kenny …ninamf/courses/401sp18/recitations... · 2018-04-19 · 3 of 42 Applications (Clustering comes up everywhere

Euclidean k-means Clustering● Input: a set of n points, x1,x2,...,xn, in Rd , an integer k● Output: k “centers” c1,c2,..., ck

Try to minimize the distance from each

point xi to its closest center

Page 6: Recitation Clustering and PCA Colin White Nupur Chatterji, Kenny …ninamf/courses/401sp18/recitations... · 2018-04-19 · 3 of 42 Applications (Clustering comes up everywhere

K-means complexity● Hard to solve even when k=2 and d=2● k=1 is easy to solve● d=1 is easy to solve (dynamic programming)

Page 7: Recitation Clustering and PCA Colin White Nupur Chatterji, Kenny …ninamf/courses/401sp18/recitations... · 2018-04-19 · 3 of 42 Applications (Clustering comes up everywhere

Page 8: Recitation Clustering and PCA Colin White Nupur Chatterji, Kenny …ninamf/courses/401sp18/recitations... · 2018-04-19 · 3 of 42 Applications (Clustering comes up everywhere

Lloyd’s initializationInitialization is very important for Lloyd’s method

● Random initialization● Farthest-first traversal: iteratively choose farthest point from current set● d2-sampling (k-means++) iteratively choose a point v with probability

dmin(v,C)2, where C is the list of current centers

Theorem: k-means++ always attains an O(log k) approximation to the optimal k-means solution in expectation.

Page 9: Recitation Clustering and PCA Colin White Nupur Chatterji, Kenny …ninamf/courses/401sp18/recitations... · 2018-04-19 · 3 of 42 Applications (Clustering comes up everywhere

K-means runtime● K-means++ initialization O(nkd) time● Lloyd’s method: O(nkd) time

Exponential number of rounds in the worst case

Small number of rounds in practice

Expected number of rounds is polynomial time under smoothed analysis

Page 10: Recitation Clustering and PCA Colin White Nupur Chatterji, Kenny …ninamf/courses/401sp18/recitations... · 2018-04-19 · 3 of 42 Applications (Clustering comes up everywhere

Hierarchical Clustering● What if we don’t know the right value of k? (often the case)

Often leads to natural solutions

Page 11: Recitation Clustering and PCA Colin White Nupur Chatterji, Kenny …ninamf/courses/401sp18/recitations... · 2018-04-19 · 3 of 42 Applications (Clustering comes up everywhere

Page 12: Recitation Clustering and PCA Colin White Nupur Chatterji, Kenny …ninamf/courses/401sp18/recitations... · 2018-04-19 · 3 of 42 Applications (Clustering comes up everywhere

Runtime:● O(n3) is easy● Can achieve O(n2 log n)

Clustering 2: Hierarchical clustering

Clustering 2: Hierarchical clustering

Documents

Iterative Reclassification in Agglomerative Clustering of... · Iterative Reclassification in Agglomerative Clustering ... clustering for finding improved ... flat clustering

Iterative Reclassification in Agglomerative Clustering of... · Iterative Reclassification in Agglomerative Clustering ... clustering for finding improved ... flat clustering

Documents

Guest Lecture: Clusteringcvml.ist.ac.at/talks/clustering-core2018.pdfsingle linkage clustering, complete linkage clustering, average linkage clustering Graph-based clustering spectral

Guest Lecture: Clusteringcvml.ist.ac.at/talks/clustering-core2018.pdfsingle linkage clustering, complete linkage clustering, average linkage clustering Graph-based clustering spectral

Documents

€¦ · SSI 5 - Conflict, Migration, and Diaspora Manas Chatterji SSI 7 - Regional Science and Peace Science Manas Chatterji Natural and Man-Made Disaster Management Manas Chatterji

€¦ · SSI 5 - Conflict, Migration, and Diaspora Manas Chatterji SSI 7 - Regional Science and Peace Science Manas Chatterji Natural and Man-Made Disaster Management Manas Chatterji

Documents

Memories of the 1940s · Memories of the 1940s Sadhona Debi Chatterji with Aadhira, one of her great-grandchildren Sadhona Debi Chatterji was born in October 1931 in Calcutta, to

Memories of the 1940s · Memories of the 1940s Sadhona Debi Chatterji with Aadhira, one of her great-grandchildren Sadhona Debi Chatterji was born in October 1931 in Calcutta, to

Documents

Multiple Species Gene Finding Sourav Chatterji souravc@eecs.berkeley.edu

Multiple Species Gene Finding Sourav Chatterji [email protected]

Documents

Collaborative Clustering for Entity Clustering

Collaborative Clustering for Entity Clustering

Documents

STUDY CENTRE NAME FATHER NAME COURSES 168392401 …centre name father name courses 168392401 dce 28140 aadrita chatterji adri nath chatterji dce1 dce2 dce3 dce4 dce6 169536224 dece

STUDY CENTRE NAME FATHER NAME COURSES 168392401 …centre name father name courses 168392401 dce 28140 aadrita chatterji adri nath chatterji dce1 dce2 dce3 dce4 dce6 169536224 dece

Documents

MANAS CHATTERJI - Binghamton University€¢ Delivered Manas Chatterji Award for Excellence in Research in Peace Economics and Peace Science at the 15th Jan Tinbergen European Peace

MANAS CHATTERJI - Binghamton University€¢ Delivered Manas Chatterji Award for Excellence in Research in Peace Economics and Peace Science at the 15th Jan Tinbergen European Peace

Documents

Clustering Clustering

Clustering Clustering

Documents

Modul clustering data mining modul clustering

Modul clustering data mining modul clustering

Data & Analytics

Technical Terms and Technique of Sanskrit Grammar (Kshitish Chandra Chatterji)

Technical Terms and Technique of Sanskrit Grammar (Kshitish Chandra Chatterji)

Documents

Minni Singla* 1 , Sudeep Chatterji 2 , V.Kleipa 2 , W.F.J.Mueller 2 and J.M.Heuser 2

Minni Singla* 1 , Sudeep Chatterji 2 , V.Kleipa 2 , W.F.J.Mueller 2 and J.M.Heuser 2

Documents

FUZZY CLUSTERING 2009/2010. 2 What is Data Clustering? Fuzzy C-Means Clustering Subtractive Clustering Data Clustering Using the Clustering GUI

FUZZY CLUSTERING 2009/2010. 2 What is Data Clustering? Fuzzy C-Means Clustering Subtractive Clustering Data Clustering Using the Clustering GUI

Documents

The Hindu Realism 1912 - Jagadisha Chandra Chatterji

The Hindu Realism 1912 - Jagadisha Chandra Chatterji

Documents

Clustering in Ratemaking: Applications in Territories ... · Clustering in Ratemaking: Applications in Territories Clustering OVERVIEW OF CLUSTERING ¾Purpose of Clustering in Insurance

Clustering in Ratemaking: Applications in Territories ... · Clustering in Ratemaking: Applications in Territories Clustering OVERVIEW OF CLUSTERING ¾Purpose of Clustering in Insurance

Documents

PROF. BISWA NATH CHATTERJI, - BPPIMT

PROF. BISWA NATH CHATTERJI, - BPPIMT

Documents

Tridib Chatterji Dubai Womens College April, 2010

Tridib Chatterji Dubai Womens College April, 2010

Documents

Clustering k-mean clustering

Clustering k-mean clustering

Documents

Clustering. 2 Outline Introduction K-means clustering Hierarchical clustering: COBWEB

Clustering. 2 Outline Introduction K-means clustering Hierarchical clustering: COBWEB

Documents

Bi-Clustering. 2 Data Mining: Clustering Where K-means clustering minimizes

Bi-Clustering. 2 Data Mining: Clustering Where K-means clustering minimizes

Documents

A case study: balwadis in Pragat Shikshan Sanstha Dr. Shubhra Chatterji & Devika Sharma Data collected by Dr. Shubhra Chatterji, Devika Sharma, & Swati

A case study: balwadis in Pragat Shikshan Sanstha Dr. Shubhra Chatterji & Devika Sharma Data collected by Dr. Shubhra Chatterji, Devika Sharma, & Swati

Documents

[Bankim Chandra Chatterji] Anandamath(Bookos.org)

[Bankim Chandra Chatterji] Anandamath(Bookos.org)

Documents

Samaya Pradipa - Ashoka Chatterji Shastri

Samaya Pradipa - Ashoka Chatterji Shastri

Documents

CLUSTERING. Overview Definition of Clustering Existing clustering methods Clustering examples

CLUSTERING. Overview Definition of Clustering Existing clustering methods Clustering examples

Documents

Airspace Concept Evaluation System- State of Development Gano B. Chatterji

Airspace Concept Evaluation System- State of Development Gano B. Chatterji

Documents

Affinity Clustering: Hierarchical Clustering at Scalepapers.nips.cc/paper/7262-affinity-clustering-hierarchical-clustering-at-scale.pdf · Afﬁnity Clustering: Hierarchical Clustering

Affinity Clustering: Hierarchical Clustering at Scalepapers.nips.cc/paper/7262-affinity-clustering-hierarchical-clustering-at-scale.pdf · Afﬁnity Clustering: Hierarchical Clustering

Documents

Chatterji Development and Validation of Indicators of Teacher Proficiency in Diagnostic Classroom Assessment

Chatterji Development and Validation of Indicators of Teacher Proficiency in Diagnostic Classroom Assessment

Documents

Pinka Chatterji, PhD, Mingshan Lu, PhD, Margarita Alegria, PhD and David Takeuchi, PhD

Pinka Chatterji, PhD, Mingshan Lu, PhD, Margarita Alegria, PhD and David Takeuchi, PhD

Documents

Cluster Ensembles Subspace Clustering Distributed Clustering

Cluster Ensembles Subspace Clustering Distributed Clustering

Documents