13

Stochastic Unsupervised Learning on Unlabeled Data July 2, 2011 Presented by Jianjun Xie – CoreLogic Collaborated with Chuanren Liu, Yong Ge and Hui Xiong – Rutgers, the State University of New Jersey

Stochastic Unsupervised Learning on Unlabeled Data

Download PPTX Report

Upload
martha
View
40
Download
2

Tags:

Embed Size (px)

DESCRIPTION

Stochastic Unsupervised Learning on Unlabeled Data. Presented by Jianjun Xie – CoreLogic Collaborated with Chuanren Liu, Yong Ge and Hui Xiong – Rutgers, the State University of New Jersey. July 2, 2011. Our Story. - PowerPoint PPT Presentation

Citation preview

Page 1: Stochastic Unsupervised Learning on Unlabeled Data

Stochastic Unsupervised Learning on Unlabeled Data

July 2, 2011

Presented by Jianjun Xie – CoreLogicCollaborated with Chuanren Liu, Yong Ge and Hui Xiong – Rutgers, the State University of New Jersey

Page 2: Stochastic Unsupervised Learning on Unlabeled Data

Our Story

“Let’s set up a team to compete another data mining challenge” – a call with Rutgers

Is it a competition on data preprocessing?

Transfer the problem into a clustering problem: How many clusters we are shooting for? What distance measurement works better? Go with the stochastic K-means clustering.

Page 3: Stochastic Unsupervised Learning on Unlabeled Data

Dataset Recap

Five real world data sets were extracted from different domains No labels were provided during unsupervised learning challenge The withheld labels are multi-class labels.

Some records can belong to different labels at the same time Performance was measured by a global score, which is defined as

Area Under Learning Curve A simple linear classifier (Hebbian learner) was used to calculate

the learning curve Focus on small number of training samples by log2 scaling on x-

axis of the learning curve

Page 4: Stochastic Unsupervised Learning on Unlabeled Data

Evolution of Our Approaches

Simple Data Preprocessing Normalization: Z-scale (std=1, mean=0) TF-IDF on text recognition (TERRY dataset)

PCA: PCA on raw data PCA on normalized data Normalized PCA vs. non-normalized PCA

K-means Clustering Cluster on top N normalized PCs Cosine similarity vs. Euclidian distance

Page 5: Stochastic Unsupervised Learning on Unlabeled Data

Stochastic Clustering Process

Given Data set X, number of cluster K, and iteration N For n=1, 2, …, N

Randomly choose K seeds from X Perform K-means clustering, assign each record a cluster

membership In

Transform In into binary representation Combine the N binary representation together as the final result Example of binary representation of clusters

Say cluster label = 1,2,3 Binary representation will be (1 0 0) (0 1 0) and (0 0 1)

Our final approach

Page 6: Stochastic Unsupervised Learning on Unlabeled Data

Results of Our ApproachesDataset Harry – human action recognition

Page 7: Stochastic Unsupervised Learning on Unlabeled Data

ResultsDataset Rita – object recognition

Page 8: Stochastic Unsupervised Learning on Unlabeled Data

ResultsDataset Sylvester-- ecology

Page 9: Stochastic Unsupervised Learning on Unlabeled Data

ResultsDataset Terry – text recognition

Page 10: Stochastic Unsupervised Learning on Unlabeled Data

ResultsDataset Avicenna – Arabic manuscripts

Page 11: Stochastic Unsupervised Learning on Unlabeled Data

Summary on ResultsOverall rank 2nd.

Pie Chart Title

Dataset Winner Valid

Winner Final

Winner Rank

Our Valid

Our Final

Our Rank

Avecinna 0.1744 0.2183 1 0.1386 0.1906 6

Harry 0.8640 0.7043 6 0.9085 0.7357 3

Rita 0.3095 0.4951 1 0.3737 0.4782 5

Sylvester 0.6409 0.4569 6 0.7146 0.5828 1

Terry 8.195 0.8465 1 0.8176 0.8437 2

Page 12: Stochastic Unsupervised Learning on Unlabeled Data

Discussions

Stochastic clustering can generate better results than PCA in general

Cosine similarity distance is better than Euclidian distance Normalized data is better than non-normalized data for k-means in

general Number of clusters (K) is an important factor, but can be relaxed for

this particular competition.

Page 13: Stochastic Unsupervised Learning on Unlabeled Data

Thank you !Questions?

UNSUPERVISED LEARNING AND CLUSTERING · Unsupervised Procedures A procedure that uses unlabeled data in its classification process. Why would we use these? Collecting and labeling

UNSUPERVISED LEARNING AND CLUSTERING · Unsupervised Procedures A procedure that uses unlabeled data in its classification process. Why would we use these? Collecting and labeling

Documents

Outline Facial Attributes Analysis Animated Pose Templates(APT) for Modeling and Detecting Human Actions Unsupervised Structure Learning of Stochastic

Outline Facial Attributes Analysis Animated Pose Templates(APT) for Modeling and Detecting Human Actions Unsupervised Structure Learning of Stochastic

Documents

Supervised and Unsupervised Algorithm Clustering Music by ...cs229.stanford.edu/proj2015/129_poster.pdf · rithm to e ciently separate a set of unlabeled music samples into groups

Supervised and Unsupervised Algorithm Clustering Music by ...cs229.stanford.edu/proj2015/129_poster.pdf · rithm to e ciently separate a set of unlabeled music samples into groups

Documents

Unsupervised Learning of Feature Hierarchies · 2015. 10. 28. · representations, and invariant feature hierarchies from unlabeled data. These methods go beyond traditional supervised

Unsupervised Learning of Feature Hierarchies · 2015. 10. 28. · representations, and invariant feature hierarchies from unlabeled data. These methods go beyond traditional supervised

Documents

NAME MATCHING WITH PHYLOGENIESjason/papers/andrews+al.emnlp12...Jaro Winkler Levenshtein 10 entities 10+unlabeled Unsupervised 1500 entities 0 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80

NAME MATCHING WITH PHYLOGENIESjason/papers/andrews+al.emnlp12...Jaro Winkler Levenshtein 10 entities 10+unlabeled Unsupervised 1500 entities 0 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80

Documents

Extracting Adaptive Contextual Cues From Unlabeled Regions

Extracting Adaptive Contextual Cues From Unlabeled Regions

Documents

Lecture 10: Learning in Stochastic Image GrammarLecture 10: Learning in Stochastic Image Grammar . Unsupervised learning and learning rate . Song-Chun Zhu . Center for Vision, Cognition,

Lecture 10: Learning in Stochastic Image GrammarLecture 10: Learning in Stochastic Image Grammar . Unsupervised learning and learning rate . Song-Chun Zhu . Center for Vision, Cognition,

Documents

Unsupervised Structure Learning of Stochastic And-Or Grammarssczhu/papers/Conf_2013/Learning_AoG_NIPS_2013.pdf · Unsupervised Structure Learning of Stochastic And-Or Grammars Kewei

Unsupervised Structure Learning of Stochastic And-Or Grammarssczhu/papers/Conf_2013/Learning_AoG_NIPS_2013.pdf · Unsupervised Structure Learning of Stochastic And-Or Grammars Kewei

Documents

Learning from labelled and unlabeled data

Learning from labelled and unlabeled data

Documents

Software Defect Prediction on Unlabeled Datasets

Software Defect Prediction on Unlabeled Datasets

Software

Positive Unlabeled Learning for Time Series Classification

Positive Unlabeled Learning for Time Series Classification

Documents

Ranking Models in Unlabeled New Environments

Ranking Models in Unlabeled New Environments

Documents

Technical University of Darmstadt, Darmstadt, Germany … · 2020. 3. 23. · SER-FIQ: Unsupervised Estimation of Face Image Quality Based on Stochastic Embedding Robustness Philipp

Technical University of Darmstadt, Darmstadt, Germany … · 2020. 3. 23. · SER-FIQ: Unsupervised Estimation of Face Image Quality Based on Stochastic Embedding Robustness Philipp

Documents

Fraud Detection on Unlabeled Data with Unsupervised Machine …1217521/... · 2018-06-13 · Fraud Detection on Unlabeled Data with Unsupervised Machine Learning ... 5.2 Evaluation

Fraud Detection on Unlabeled Data with Unsupervised Machine …1217521/... · 2018-06-13 · Fraud Detection on Unlabeled Data with Unsupervised Machine Learning ... 5.2 Evaluation

Documents

Lecture 11 - csd.uwo.ca · Lecture 11 Unsupervised Learning EM. Today • New Topic: Unsupervised Learning • supervised vs. unsupervised learning • unsupervised learning • nonparametric

Lecture 11 - csd.uwo.ca · Lecture 11 Unsupervised Learning EM. Today • New Topic: Unsupervised Learning • supervised vs. unsupervised learning • unsupervised learning • nonparametric

Documents

Unsupervised Learning. Supervised learning vs. unsupervised learning

Unsupervised Learning. Supervised learning vs. unsupervised learning

Documents

Unlabeled Far-field Deeply Subwavelength Topological

Unlabeled Far-field Deeply Subwavelength Topological

Documents

Learning from Positive and Unlabeled Examples

Learning from Positive and Unlabeled Examples

Documents

Unsupervised Learning Learning Unsupervised...Unsupervised Learning and Data Mining Learning Mining Unsupervised Learning and Data Mining Unsupervised Data Clustering Supervised Learning

Unsupervised Learning Learning Unsupervised...Unsupervised Learning and Data Mining Learning Mining Unsupervised Learning and Data Mining Unsupervised Data Clustering Supervised Learning

Documents

A Geometric Framework for Unsupervised Anomaly Detection: Detecting Intrusions in Unlabeled Data Authors: Eleazar Eskin, Andrew Arnold, Michael Prerau,

A Geometric Framework for Unsupervised Anomaly Detection: Detecting Intrusions in Unlabeled Data Authors: Eleazar Eskin, Andrew Arnold, Michael Prerau,

Documents

Adversarial Knowledge Transfer from Unlabeled Data

Adversarial Knowledge Transfer from Unlabeled Data

Documents

Cost-Sensitive Positive and Unlabeled Learning

Cost-Sensitive Positive and Unlabeled Learning

Documents

UVA CS 4501: Machine Learning Lecture 19: Unsupervised ... · Unsupervised learning = learning from raw (unlabeled, unannotated, etc) data, as opposed to supervised data ... Intuitions

UVA CS 4501: Machine Learning Lecture 19: Unsupervised ... · Unsupervised learning = learning from raw (unlabeled, unannotated, etc) data, as opposed to supervised data ... Intuitions

Documents

Unsupervised Extra Trees: a stochastic approach to compute

Unsupervised Extra Trees: a stochastic approach to compute

Documents

Fraud Detection in Unlabeled Payment Card Transactions · In this work we present an unsupervised method for credit card fraud detection, based on projecting the dataset into low

Fraud Detection in Unlabeled Payment Card Transactions · In this work we present an unsupervised method for credit card fraud detection, based on projecting the dataset into low

Documents

נוירוביולוגיה ומדעי המוח 2010. 1. Unsupervised Learning: Only network inputs are available to the learning algorithm. The network is given only unlabeled

נוירוביולוגיה ומדעי המוח 2010. 1. Unsupervised Learning: Only network inputs are available to the learning algorithm. The network is given only unlabeled

Documents

Unsupervised Learning of Stereo Vision with Monocular Cuesdmcallester/stereo_learning.pdf · cues from unlabeled stereo pair training data. However, the stereo pair data is not viewed

Unsupervised Learning of Stereo Vision with Monocular Cuesdmcallester/stereo_learning.pdf · cues from unlabeled stereo pair training data. However, the stereo pair data is not viewed

Documents

Stochastic k- Neighborhood Selection for Supervised and Unsupervised Learning

Stochastic k- Neighborhood Selection for Supervised and Unsupervised Learning

Documents

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND ......of human motion automatically from unlabeled image sequences, and testing the learned models on a variety of sequences. Index Terms—Unsupervised

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND ......of human motion automatically from unlabeled image sequences, and testing the learned models on a variety of sequences. Index Terms—Unsupervised

Documents

Dimension Reduction Methods in High Dimensional Data Miningsau.ac.in/~mlta/studymaterial/22-12-2013/PKSingh.pdf · Clustering groups the unlabeled data set based on unsupervised learning

Dimension Reduction Methods in High Dimensional Data Miningsau.ac.in/~mlta/studymaterial/22-12-2013/PKSingh.pdf · Clustering groups the unlabeled data set based on unsupervised learning

Documents