Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Unsupervised Ranking and Ensemble Learning
or
Making good decisions when knowing nothing
Boaz Nadler
Department of Computer Science and Applied MathematicsThe Weizmann Institute of Science
Joint work with
Fabio Parisi, Francesco Strino and Yuval Kluger (Yale Medical School)and Ariel Jaffe (Weizmann)
Nov. 2014
Boaz Nadler Unsupervised Ranking, Ensemble Learning 1
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Problem Setup
Consider binary classification problem, over some instance space Xand output Y ∈ {−1,+1}.
Goal:Construct a classifier with good generalization (small risk)
Typical Supervised Case:
Labeled training set {xi , yi}ni=1
Many methods to construct classifiers, well understood theory
Boaz Nadler Unsupervised Ranking, Ensemble Learning 2
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Problem Setup
Consider binary classification problem, over some instance space Xand output Y ∈ {−1,+1}.
Goal:Construct a classifier with good generalization (small risk)
Typical Supervised Case:
Labeled training set {xi , yi}ni=1
Many methods to construct classifiers, well understood theory
Boaz Nadler Unsupervised Ranking, Ensemble Learning 2
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Problem Setup
Consider binary classification problem, over some instance space Xand output Y ∈ {−1,+1}.
Goal:Construct a classifier with good generalization (small risk)
Typical Supervised Case:
Labeled training set {xi , yi}ni=1
Many methods to construct classifiers, well understood theory
Boaz Nadler Unsupervised Ranking, Ensemble Learning 2
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Problem Setup
Consider binary classification problem, over some instance space Xand output Y ∈ {−1,+1}.
Goal:Construct a classifier with good generalization (small risk)
Typical Supervised Case:
Labeled training set {xi , yi}ni=1
Many methods to construct classifiers, well understood theory
Boaz Nadler Unsupervised Ranking, Ensemble Learning 2
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Supervised Ranking - Multiple Classifiers
Multiple Classifiers:
we are given m classifiers, f1, . . . , fm
Each classifier constructed with own training data, assumptions,design principles, etc.
Two Key Questions:
- Rank, find the most accurate classifier
- Combine them to a more accurate meta-classifier (ensemblelearner)
Boaz Nadler Unsupervised Ranking, Ensemble Learning 3
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Supervised Ranking - Multiple Classifiers
Multiple Classifiers:
we are given m classifiers, f1, . . . , fm
Each classifier constructed with own training data, assumptions,design principles, etc.
Two Key Questions:
- Rank, find the most accurate classifier
- Combine them to a more accurate meta-classifier (ensemblelearner)
Boaz Nadler Unsupervised Ranking, Ensemble Learning 3
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Supervised Ranking - Multiple Classifiers
Multiple Classifiers:
we are given m classifiers, f1, . . . , fm
Each classifier constructed with own training data, assumptions,design principles, etc.
Two Key Questions:
- Rank, find the most accurate classifier
- Combine them to a more accurate meta-classifier (ensemblelearner)
Boaz Nadler Unsupervised Ranking, Ensemble Learning 3
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Supervised Ranking - Multiple Classifiers
Multiple Classifiers:
we are given m classifiers, f1, . . . , fm
Each classifier constructed with own training data, assumptions,design principles, etc.
Two Key Questions:
- Rank, find the most accurate classifier
- Combine them to a more accurate meta-classifier (ensemblelearner)
Boaz Nadler Unsupervised Ranking, Ensemble Learning 3
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Supervised Ranking - Multiple Classifiers
Multiple Classifiers:
we are given m classifiers, f1, . . . , fm
Each classifier constructed with own training data, assumptions,design principles, etc.
Two Key Questions:
- Rank, find the most accurate classifier
- Combine them to a more accurate meta-classifier (ensemblelearner)
Boaz Nadler Unsupervised Ranking, Ensemble Learning 3
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Supervised Ranking and Ensemble Learning
Standard Approach:
Set aside an independent set of labeled validation data
Rank classifiers by empirical accuracies on this labeled set.
Many methods to construct ensemble learners[bagging, boosting, etc]
Central Question in this Talk:
Given the predictions of m classifiers over a large test data
can we rank them and construct a more accurate meta-classifier
without any labeled data ?
Boaz Nadler Unsupervised Ranking, Ensemble Learning 4
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Supervised Ranking and Ensemble Learning
Standard Approach:
Set aside an independent set of labeled validation data
Rank classifiers by empirical accuracies on this labeled set.
Many methods to construct ensemble learners[bagging, boosting, etc]
Central Question in this Talk:
Given the predictions of m classifiers over a large test data
can we rank them and construct a more accurate meta-classifier
without any labeled data ?
Boaz Nadler Unsupervised Ranking, Ensemble Learning 4
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Supervised Ranking and Ensemble Learning
Standard Approach:
Set aside an independent set of labeled validation data
Rank classifiers by empirical accuracies on this labeled set.
Many methods to construct ensemble learners[bagging, boosting, etc]
Central Question in this Talk:
Given the predictions of m classifiers over a large test data
can we rank them and construct a more accurate meta-classifier
without any labeled data ?
Boaz Nadler Unsupervised Ranking, Ensemble Learning 4
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Supervised Ranking and Ensemble Learning
Standard Approach:
Set aside an independent set of labeled validation data
Rank classifiers by empirical accuracies on this labeled set.
Many methods to construct ensemble learners[bagging, boosting, etc]
Central Question in this Talk:
Given the predictions of m classifiers over a large test data
can we rank them and construct a more accurate meta-classifier
without any labeled data ?
Boaz Nadler Unsupervised Ranking, Ensemble Learning 4
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Supervised Ranking and Ensemble Learning
Standard Approach:
Set aside an independent set of labeled validation data
Rank classifiers by empirical accuracies on this labeled set.
Many methods to construct ensemble learners[bagging, boosting, etc]
Central Question in this Talk:
Given the predictions of m classifiers over a large test data
can we rank them and construct a more accurate meta-classifier
without any labeled data ?
Boaz Nadler Unsupervised Ranking, Ensemble Learning 4
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Supervised Ranking and Ensemble Learning
Standard Approach:
Set aside an independent set of labeled validation data
Rank classifiers by empirical accuracies on this labeled set.
Many methods to construct ensemble learners[bagging, boosting, etc]
Central Question in this Talk:
Given the predictions of m classifiers over a large test data
can we rank them and construct a more accurate meta-classifier
without any labeled data ?
Boaz Nadler Unsupervised Ranking, Ensemble Learning 4
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Supervised Ranking and Ensemble Learning
Standard Approach:
Set aside an independent set of labeled validation data
Rank classifiers by empirical accuracies on this labeled set.
Many methods to construct ensemble learners[bagging, boosting, etc]
Central Question in this Talk:
Given the predictions of m classifiers over a large test data
can we rank them and construct a more accurate meta-classifier
without any labeled data ?
Boaz Nadler Unsupervised Ranking, Ensemble Learning 4
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Motivating Example I:
Consider an investor intending to trade n stocks.
He gets advice from m entities (sell/buy on each of the n stocks)
Entities = friends, professional investment houses, his mother(in-law), the WSJ, etc.
However, our investor knows nothing about the reliability of theadvisors.
Questions:
Can our investor find out who is the most reliable advisor ?
How should he combine the (possibly conflicting) advices of the m entities ?
Boaz Nadler Unsupervised Ranking, Ensemble Learning 5
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Motivating Example I:
Consider an investor intending to trade n stocks.
He gets advice from m entities (sell/buy on each of the n stocks)
Entities = friends, professional investment houses, his mother(in-law), the WSJ, etc.
However, our investor knows nothing about the reliability of theadvisors.
Questions:
Can our investor find out who is the most reliable advisor ?
How should he combine the (possibly conflicting) advices of the m entities ?
Boaz Nadler Unsupervised Ranking, Ensemble Learning 5
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Motivating Example I:
Consider an investor intending to trade n stocks.
He gets advice from m entities (sell/buy on each of the n stocks)
Entities = friends, professional investment houses, his mother(in-law), the WSJ, etc.
However, our investor knows nothing about the reliability of theadvisors.
Questions:
Can our investor find out who is the most reliable advisor ?
How should he combine the (possibly conflicting) advices of the m entities ?
Boaz Nadler Unsupervised Ranking, Ensemble Learning 5
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Motivating Example I:
Consider an investor intending to trade n stocks.
He gets advice from m entities (sell/buy on each of the n stocks)
Entities = friends, professional investment houses, his mother(in-law), the WSJ, etc.
However, our investor knows nothing about the reliability of theadvisors.
Questions:
Can our investor find out who is the most reliable advisor ?
How should he combine the (possibly conflicting) advices of the m entities ?
Boaz Nadler Unsupervised Ranking, Ensemble Learning 5
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Motivating Example II
A biologist wishes to know where, along a given long DNA string,are protein binding sites.
Common Approach: Apply tens of different peak detectionalgorithms that predict binding sites.
Each algorithm - derived by separate lab, using prioprietary data,employing different design principles and biological knowledge.
How should our biologist rank these algorithms ?How should she combine them and get a more accurate prediction ?
Boaz Nadler Unsupervised Ranking, Ensemble Learning 6
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Motivating Example II
A biologist wishes to know where, along a given long DNA string,are protein binding sites.
Common Approach: Apply tens of different peak detectionalgorithms that predict binding sites.
Each algorithm - derived by separate lab, using prioprietary data,employing different design principles and biological knowledge.
How should our biologist rank these algorithms ?How should she combine them and get a more accurate prediction ?
Boaz Nadler Unsupervised Ranking, Ensemble Learning 6
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Motivating Example II
A biologist wishes to know where, along a given long DNA string,are protein binding sites.
Common Approach: Apply tens of different peak detectionalgorithms that predict binding sites.
Each algorithm - derived by separate lab, using prioprietary data,employing different design principles and biological knowledge.
How should our biologist rank these algorithms ?How should she combine them and get a more accurate prediction ?
Boaz Nadler Unsupervised Ranking, Ensemble Learning 6
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Applications
Common theme to both examples: we are given predictions orrecommendations of multiple advisers of unknown reliability
No labeled data (difficult/expensive to obtain, or will be knownonly in the future)
This scenario appears in a broad range of applications:
- decision science
- crowdsourcing
- medicine
- grant application review panels...
- etc, etc
Boaz Nadler Unsupervised Ranking, Ensemble Learning 7
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Applications
Common theme to both examples: we are given predictions orrecommendations of multiple advisers of unknown reliability
No labeled data (difficult/expensive to obtain, or will be knownonly in the future)
This scenario appears in a broad range of applications:
- decision science
- crowdsourcing
- medicine
- grant application review panels...
- etc, etc
Boaz Nadler Unsupervised Ranking, Ensemble Learning 7
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Applications
Common theme to both examples: we are given predictions orrecommendations of multiple advisers of unknown reliability
No labeled data (difficult/expensive to obtain, or will be knownonly in the future)
This scenario appears in a broad range of applications:
- decision science
- crowdsourcing
- medicine
- grant application review panels...
- etc, etc
Boaz Nadler Unsupervised Ranking, Ensemble Learning 7
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Applications
Common theme to both examples: we are given predictions orrecommendations of multiple advisers of unknown reliability
No labeled data (difficult/expensive to obtain, or will be knownonly in the future)
This scenario appears in a broad range of applications:
- decision science
- crowdsourcing
- medicine
- grant application review panels...
- etc, etc
Boaz Nadler Unsupervised Ranking, Ensemble Learning 7
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Applications
Common theme to both examples: we are given predictions orrecommendations of multiple advisers of unknown reliability
No labeled data (difficult/expensive to obtain, or will be knownonly in the future)
This scenario appears in a broad range of applications:
- decision science
- crowdsourcing
- medicine
- grant application review panels...
- etc, etc
Boaz Nadler Unsupervised Ranking, Ensemble Learning 7
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Applications
Common theme to both examples: we are given predictions orrecommendations of multiple advisers of unknown reliability
No labeled data (difficult/expensive to obtain, or will be knownonly in the future)
This scenario appears in a broad range of applications:
- decision science
- crowdsourcing
- medicine
- grant application review panels...
- etc, etc
Boaz Nadler Unsupervised Ranking, Ensemble Learning 7
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Previous Works / Unsupervised Ensemble Learning
- Majority Voting (highly sub-optimal)
- Bayesian Approaches: Reaching a Consensus[DeGroot, 74’]
- Maximum Likelihood Estimate via EM[Dawid and Skeene 79’][many follow up works]
- Spectral Methods in Crowdsourcing[Karger et. al. 2010]
Boaz Nadler Unsupervised Ranking, Ensemble Learning 8
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Previous Works / Unsupervised Ensemble Learning
- Majority Voting (highly sub-optimal)
- Bayesian Approaches: Reaching a Consensus[DeGroot, 74’]
- Maximum Likelihood Estimate via EM[Dawid and Skeene 79’][many follow up works]
- Spectral Methods in Crowdsourcing[Karger et. al. 2010]
Boaz Nadler Unsupervised Ranking, Ensemble Learning 8
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Previous Works / Unsupervised Ensemble Learning
- Majority Voting (highly sub-optimal)
- Bayesian Approaches: Reaching a Consensus[DeGroot, 74’]
- Maximum Likelihood Estimate via EM[Dawid and Skeene 79’][many follow up works]
- Spectral Methods in Crowdsourcing[Karger et. al. 2010]
Boaz Nadler Unsupervised Ranking, Ensemble Learning 8
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Our Contribution
Present a novel simple spectral analysis of this problem.
Reveal low dimensional structure in this high dimensional data
Insights:
- Standard independence assumptions between classifier errors →off-diagonal of population covariance matrix of classifiers has
rank-one structure
- Entries of eigenvector of this rank-one matrix
∝ balanced accuracies of classifiers
- Allows ranking of classifiers
- Allows consistent estimation of their parameters (sensitivity,specificity)
- Derive novel unsupervised ensemble learner
Boaz Nadler Unsupervised Ranking, Ensemble Learning 9
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Our Contribution
Present a novel simple spectral analysis of this problem.
Reveal low dimensional structure in this high dimensional data
Insights:
- Standard independence assumptions between classifier errors →off-diagonal of population covariance matrix of classifiers has
rank-one structure
- Entries of eigenvector of this rank-one matrix
∝ balanced accuracies of classifiers
- Allows ranking of classifiers
- Allows consistent estimation of their parameters (sensitivity,specificity)
- Derive novel unsupervised ensemble learner
Boaz Nadler Unsupervised Ranking, Ensemble Learning 9
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Our Contribution
Present a novel simple spectral analysis of this problem.
Reveal low dimensional structure in this high dimensional data
Insights:
- Standard independence assumptions between classifier errors →off-diagonal of population covariance matrix of classifiers has
rank-one structure
- Entries of eigenvector of this rank-one matrix
∝ balanced accuracies of classifiers
- Allows ranking of classifiers
- Allows consistent estimation of their parameters (sensitivity,specificity)
- Derive novel unsupervised ensemble learner
Boaz Nadler Unsupervised Ranking, Ensemble Learning 9
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Our Contribution
Present a novel simple spectral analysis of this problem.
Reveal low dimensional structure in this high dimensional data
Insights:
- Standard independence assumptions between classifier errors →off-diagonal of population covariance matrix of classifiers has
rank-one structure
- Entries of eigenvector of this rank-one matrix
∝ balanced accuracies of classifiers
- Allows ranking of classifiers
- Allows consistent estimation of their parameters (sensitivity,specificity)
- Derive novel unsupervised ensemble learner
Boaz Nadler Unsupervised Ranking, Ensemble Learning 9
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Our Contribution
Present a novel simple spectral analysis of this problem.
Reveal low dimensional structure in this high dimensional data
Insights:
- Standard independence assumptions between classifier errors →off-diagonal of population covariance matrix of classifiers has
rank-one structure
- Entries of eigenvector of this rank-one matrix
∝ balanced accuracies of classifiers
- Allows ranking of classifiers
- Allows consistent estimation of their parameters (sensitivity,specificity)
- Derive novel unsupervised ensemble learner
Boaz Nadler Unsupervised Ranking, Ensemble Learning 9
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Statistical Formulation
Binary Classification Problem:
- instance space X (typically Rd)- output space Y = {−1,+1}.- probability density p(x , y), marginals pX (x) and pY (y).
Binary Classifier:function f : X → Y
Boaz Nadler Unsupervised Ranking, Ensemble Learning 10
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Statistical Formulation
Binary Classification Problem:
- instance space X (typically Rd)- output space Y = {−1,+1}.- probability density p(x , y), marginals pX (x) and pY (y).
Binary Classifier:function f : X → Y
Boaz Nadler Unsupervised Ranking, Ensemble Learning 10
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Classifier Quality / Performance
Sensitivity:ψ = Pr[f (X ) = 1 |Y = 1] = ratio of true positives
Specificity:η = Pr[f (X ) = −1 |Y = −1] = ratio of true negatives
Balanced Accuracy:
π =1
2[ψ + η]
Common quality measure in presence of class im-balance.
Will come as a natural measure in our setup !
Boaz Nadler Unsupervised Ranking, Ensemble Learning 11
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Classifier Quality / Performance
Sensitivity:ψ = Pr[f (X ) = 1 |Y = 1] = ratio of true positives
Specificity:η = Pr[f (X ) = −1 |Y = −1] = ratio of true negatives
Balanced Accuracy:
π =1
2[ψ + η]
Common quality measure in presence of class im-balance.
Will come as a natural measure in our setup !
Boaz Nadler Unsupervised Ranking, Ensemble Learning 11
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Classifier Quality / Performance
Sensitivity:ψ = Pr[f (X ) = 1 |Y = 1] = ratio of true positives
Specificity:η = Pr[f (X ) = −1 |Y = −1] = ratio of true negatives
Balanced Accuracy:
π =1
2[ψ + η]
Common quality measure in presence of class im-balance.
Will come as a natural measure in our setup !
Boaz Nadler Unsupervised Ranking, Ensemble Learning 11
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Problem Setup
D = {xi} ⊂ X unlabeled test data of n i.i.d. instances from pX (x).
f1, . . . , fm - an ensemble of m classifiers, of unknown accuracy.
Questions: Given the m × n matrix of all classifier’s predictions,{fi (xk)}i ,k , i = 1, . . . ,m, k = 1, . . . , n, can one
- rank the m classifiers ?
- combine the n predictions to an even improved meta-classifier ?
Key Point: perform the above without any labeled data !
Boaz Nadler Unsupervised Ranking, Ensemble Learning 12
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Problem Setup
D = {xi} ⊂ X unlabeled test data of n i.i.d. instances from pX (x).
f1, . . . , fm - an ensemble of m classifiers, of unknown accuracy.
Questions: Given the m × n matrix of all classifier’s predictions,{fi (xk)}i ,k , i = 1, . . . ,m, k = 1, . . . , n, can one
- rank the m classifiers ?
- combine the n predictions to an even improved meta-classifier ?
Key Point: perform the above without any labeled data !
Boaz Nadler Unsupervised Ranking, Ensemble Learning 12
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Problem Setup
D = {xi} ⊂ X unlabeled test data of n i.i.d. instances from pX (x).
f1, . . . , fm - an ensemble of m classifiers, of unknown accuracy.
Questions: Given the m × n matrix of all classifier’s predictions,{fi (xk)}i ,k , i = 1, . . . ,m, k = 1, . . . , n, can one
- rank the m classifiers ?
- combine the n predictions to an even improved meta-classifier ?
Key Point: perform the above without any labeled data !
Boaz Nadler Unsupervised Ranking, Ensemble Learning 12
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Problem Setup
D = {xi} ⊂ X unlabeled test data of n i.i.d. instances from pX (x).
f1, . . . , fm - an ensemble of m classifiers, of unknown accuracy.
Questions: Given the m × n matrix of all classifier’s predictions,{fi (xk)}i ,k , i = 1, . . . ,m, k = 1, . . . , n, can one
- rank the m classifiers ?
- combine the n predictions to an even improved meta-classifier ?
Key Point: perform the above without any labeled data !
Boaz Nadler Unsupervised Ranking, Ensemble Learning 12
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Problem Setup
D = {xi} ⊂ X unlabeled test data of n i.i.d. instances from pX (x).
f1, . . . , fm - an ensemble of m classifiers, of unknown accuracy.
Questions: Given the m × n matrix of all classifier’s predictions,{fi (xk)}i ,k , i = 1, . . . ,m, k = 1, . . . , n, can one
- rank the m classifiers ?
- combine the n predictions to an even improved meta-classifier ?
Key Point: perform the above without any labeled data !
Boaz Nadler Unsupervised Ranking, Ensemble Learning 12
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Problem Setup
D = {xi} ⊂ X unlabeled test data of n i.i.d. instances from pX (x).
f1, . . . , fm - an ensemble of m classifiers, of unknown accuracy.
Questions: Given the m × n matrix of all classifier’s predictions,{fi (xk)}i ,k , i = 1, . . . ,m, k = 1, . . . , n, can one
- rank the m classifiers ?
- combine the n predictions to an even improved meta-classifier ?
Key Point: perform the above without any labeled data !
Boaz Nadler Unsupervised Ranking, Ensemble Learning 12
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Assumption
Conditionally Independent Predictors:
As in supervised ensemble methods, assume errors made by oneclassifier independent of those made by others
∀ai , aj ,Y ∈ {−1, 1}:
Pr[fi (X)=ai , fj(X)=aj |Y ] = Pr[fi (X)=ai |Y ] Pr[fj(X)=aj |Y ]
Boaz Nadler Unsupervised Ranking, Ensemble Learning 13
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Assumption
Conditionally Independent Predictors:
As in supervised ensemble methods, assume errors made by oneclassifier independent of those made by others
∀ai , aj ,Y ∈ {−1, 1}:
Pr[fi (X)=ai , fj(X)=aj |Y ] = Pr[fi (X)=ai |Y ] Pr[fj(X)=aj |Y ]
Boaz Nadler Unsupervised Ranking, Ensemble Learning 13
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
The population covariance matrix
Let Q be the m ×m population covariance matrix betweenclassifiers
qij = E[(fi (X )− µi )(fj(X )− µj)]
where µi = E[fi (X )].
Lemma: Entries of Q are
qij =
{1− µ2i i = j
(2πi − 1)(2πj − 1)(1− b2
)otherwise
where b ∈ (−1, 1) is the class imbalance
b = Pr[Y = 1]− Pr[Y = −1].
Boaz Nadler Unsupervised Ranking, Ensemble Learning 14
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
The population covariance matrix
Let Q be the m ×m population covariance matrix betweenclassifiers
qij = E[(fi (X )− µi )(fj(X )− µj)]
where µi = E[fi (X )].
Lemma: Entries of Q are
qij =
{1− µ2i i = j
(2πi − 1)(2πj − 1)(1− b2
)otherwise
where b ∈ (−1, 1) is the class imbalance
b = Pr[Y = 1]− Pr[Y = −1].
Boaz Nadler Unsupervised Ranking, Ensemble Learning 14
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Unsupervised Ranking of Classifiers
Corollary: The off-diagonal entries of Q correspond to a rank-onematrix R = λvvT , where
λ = (1− b2)M∑j=1
(2πj − 1)2
Importantly, up to a ±1 sign ambiguity
vj ∝ (2πj − 1)
Key Result: If we can consistently estimate rank-one matrix R,then classifiers can be consistently ranked by the entries of thiseigenvector.
Boaz Nadler Unsupervised Ranking, Ensemble Learning 15
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Unsupervised Ranking of Classifiers
Corollary: The off-diagonal entries of Q correspond to a rank-onematrix R = λvvT , where
λ = (1− b2)M∑j=1
(2πj − 1)2
Importantly, up to a ±1 sign ambiguity
vj ∝ (2πj − 1)
Key Result: If we can consistently estimate rank-one matrix R,then classifiers can be consistently ranked by the entries of thiseigenvector.
Boaz Nadler Unsupervised Ranking, Ensemble Learning 15
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Estimating Rank-One Matrix R
Sample Covariance Matrix:
qij =1
n − 1
n∑k=1
(fi (xk)− µi )(fj(xk)− µj)
where µj =1n
∑k fj(xk).
By definition, for i = j , E[qij ] = qij = rij .
Only need to estimate diagonal entries of R and then do aneigendecomposition to compute leading eigenvector.
Boaz Nadler Unsupervised Ranking, Ensemble Learning 16
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Estimating Rank-One Matrix R
Sample Covariance Matrix:
qij =1
n − 1
n∑k=1
(fi (xk)− µi )(fj(xk)− µj)
where µj =1n
∑k fj(xk).
By definition, for i = j , E[qij ] = qij = rij .
Only need to estimate diagonal entries of R and then do aneigendecomposition to compute leading eigenvector.
Boaz Nadler Unsupervised Ranking, Ensemble Learning 16
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Estimating Rank-One Matrix R
Low Rank Matrix-Completion Problem
Several computationally efficient methods to do so:
- Semi-Definite Program- Least-Squares methods to estimate diagonal of R.
From estimate R, compute leading eigenvector v .
Rank m classifiers by sorting entries of v .
Boaz Nadler Unsupervised Ranking, Ensemble Learning 17
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Estimating Rank-One Matrix R
Low Rank Matrix-Completion Problem
Several computationally efficient methods to do so:
- Semi-Definite Program- Least-Squares methods to estimate diagonal of R.
From estimate R, compute leading eigenvector v .
Rank m classifiers by sorting entries of v .
Boaz Nadler Unsupervised Ranking, Ensemble Learning 17
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Properties of Solution
Asymptotic Consistency: As unlabeled set size n → ∞, t → t,R → R and consequently v → v .
If class imbalance b known, then this method also gives consistentestimates for πi .
Estimating ψi , ηi :
µ = E[f ] = (ψ − η) + b(2π − 1)
Hence given b and v , solve system of 2 linear equations in 2unknowns (ψ, η).
Boaz Nadler Unsupervised Ranking, Ensemble Learning 18
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Properties of Solution
Asymptotic Consistency: As unlabeled set size n → ∞, t → t,R → R and consequently v → v .
If class imbalance b known, then this method also gives consistentestimates for πi .
Estimating ψi , ηi :
µ = E[f ] = (ψ − η) + b(2π − 1)
Hence given b and v , solve system of 2 linear equations in 2unknowns (ψ, η).
Boaz Nadler Unsupervised Ranking, Ensemble Learning 18
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Properties of Solution
Asymptotic Consistency: As unlabeled set size n → ∞, t → t,R → R and consequently v → v .
If class imbalance b known, then this method also gives consistentestimates for πi .
Estimating ψi , ηi :
µ = E[f ] = (ψ − η) + b(2π − 1)
Hence given b and v , solve system of 2 linear equations in 2unknowns (ψ, η).
Boaz Nadler Unsupervised Ranking, Ensemble Learning 18
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Properties of Solution
Stability for Finite Sample Size S:
Fluctuations are
Q − Q = OP
(1√n
)
Via matrix perturbation approach, similar to [N., AoS 08’]
v − v = OP
(1
λ
1√n
)Remark: If all m classifiers are better than random, spectral gap,
λ = O(m)
ψi − ψi , ηi − ηi = OP
(1√n
)
Boaz Nadler Unsupervised Ranking, Ensemble Learning 19
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Properties of Solution
Stability for Finite Sample Size S:
Fluctuations are
Q − Q = OP
(1√n
)Via matrix perturbation approach, similar to [N., AoS 08’]
v − v = OP
(1
λ
1√n
)
Remark: If all m classifiers are better than random, spectral gap,
λ = O(m)
ψi − ψi , ηi − ηi = OP
(1√n
)
Boaz Nadler Unsupervised Ranking, Ensemble Learning 19
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Properties of Solution
Stability for Finite Sample Size S:
Fluctuations are
Q − Q = OP
(1√n
)Via matrix perturbation approach, similar to [N., AoS 08’]
v − v = OP
(1
λ
1√n
)Remark: If all m classifiers are better than random, spectral gap,
λ = O(m)
ψi − ψi , ηi − ηi = OP
(1√n
)
Boaz Nadler Unsupervised Ranking, Ensemble Learning 19
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Estimating Class Imbalance
Look at 3-D joint covariance of triplets of classifiers.
Lemma: If classifiers independent errors in triplets, off-diagonal of3D tensor is rank-one,
E[(fi (X )− µi )(fj(X )− µj)(fk(X )− µk)] = α(b)vivjvk
where
α(b) = − 2b√1− b2
Hence: Compute 3-D tensor. Via least squares estimate singlescalar α(b). Invert to estimate b.
asymptotically consisitent, no tensor decomposition
Boaz Nadler Unsupervised Ranking, Ensemble Learning 20
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Estimating Class Imbalance
Look at 3-D joint covariance of triplets of classifiers.
Lemma: If classifiers independent errors in triplets, off-diagonal of3D tensor is rank-one,
E[(fi (X )− µi )(fj(X )− µj)(fk(X )− µk)] = α(b)vivjvk
where
α(b) = − 2b√1− b2
Hence: Compute 3-D tensor. Via least squares estimate singlescalar α(b). Invert to estimate b.
asymptotically consisitent, no tensor decomposition
Boaz Nadler Unsupervised Ranking, Ensemble Learning 20
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Estimating Class Imbalance
Look at 3-D joint covariance of triplets of classifiers.
Lemma: If classifiers independent errors in triplets, off-diagonal of3D tensor is rank-one,
E[(fi (X )− µi )(fj(X )− µj)(fk(X )− µk)] = α(b)vivjvk
where
α(b) = − 2b√1− b2
Hence: Compute 3-D tensor. Via least squares estimate singlescalar α(b). Invert to estimate b.
asymptotically consisitent, no tensor decomposition
Boaz Nadler Unsupervised Ranking, Ensemble Learning 20
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Learning Mixtures of Product Distributions
If we assume all classifiers jointly make independent errors, problemis equivalent to learning mixture of discrete product distributions.
Each column vector of length m comes from one of twodistributions according to unobserved latent variable y (classlabel).
[Freund and Mansour 99’]Several computationally efficient methods developed for thisRecently, via tensor decompositions [Anandkumar, Hsu andKakade 12’]
Our method provides elegant much simpler method for the binarycase.
Boaz Nadler Unsupervised Ranking, Ensemble Learning 21
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Learning Mixtures of Product Distributions
If we assume all classifiers jointly make independent errors, problemis equivalent to learning mixture of discrete product distributions.
Each column vector of length m comes from one of twodistributions according to unobserved latent variable y (classlabel).
[Freund and Mansour 99’]Several computationally efficient methods developed for thisRecently, via tensor decompositions [Anandkumar, Hsu andKakade 12’]
Our method provides elegant much simpler method for the binarycase.
Boaz Nadler Unsupervised Ranking, Ensemble Learning 21
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Learning Mixtures of Product Distributions
If we assume all classifiers jointly make independent errors, problemis equivalent to learning mixture of discrete product distributions.
Each column vector of length m comes from one of twodistributions according to unobserved latent variable y (classlabel).
[Freund and Mansour 99’]Several computationally efficient methods developed for thisRecently, via tensor decompositions [Anandkumar, Hsu andKakade 12’]
Our method provides elegant much simpler method for the binarycase.
Boaz Nadler Unsupervised Ranking, Ensemble Learning 21
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
PART II: How to combine the m classifiers ?
Boaz Nadler Unsupervised Ranking, Ensemble Learning 22
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Maximum Likelihood
Unknowns:- ψi , ηi = specificities and sensitivities of m classifiers- y = (y1, . . . , yn) = true labels of n instances
Common Approach:Look for ψi , ηi , y that maximize the likelihood.
Given assumption of independence of errors of different classifiers,for instance x ,
L(f1(x), . . . , fm(x) |y, ψi , ηi ) =m∏i=1
Pr[fi (x) |y, ψi , ηi ]
Assuming instances are i.i.d.
L(fi (xj) |y, ψi , ηi ) =n∏
j=1
L(fi (xj) | yj , ψi , ηi )
Boaz Nadler Unsupervised Ranking, Ensemble Learning 23
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Maximum Likelihood
Unknowns:- ψi , ηi = specificities and sensitivities of m classifiers- y = (y1, . . . , yn) = true labels of n instances
Common Approach:Look for ψi , ηi , y that maximize the likelihood.
Given assumption of independence of errors of different classifiers,for instance x ,
L(f1(x), . . . , fm(x) |y, ψi , ηi ) =m∏i=1
Pr[fi (x) |y, ψi , ηi ]
Assuming instances are i.i.d.
L(fi (xj) |y, ψi , ηi ) =n∏
j=1
L(fi (xj) | yj , ψi , ηi )
Boaz Nadler Unsupervised Ranking, Ensemble Learning 23
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Maximum Likelihood
Unknowns:- ψi , ηi = specificities and sensitivities of m classifiers- y = (y1, . . . , yn) = true labels of n instances
Common Approach:Look for ψi , ηi , y that maximize the likelihood.
Given assumption of independence of errors of different classifiers,for instance x ,
L(f1(x), . . . , fm(x) |y, ψi , ηi ) =m∏i=1
Pr[fi (x) |y, ψi , ηi ]
Assuming instances are i.i.d.
L(fi (xj) |y, ψi , ηi ) =n∏
j=1
L(fi (xj) | yj , ψi , ηi )
Boaz Nadler Unsupervised Ranking, Ensemble Learning 23
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Maximum Likelihood
Unknowns:- ψi , ηi = specificities and sensitivities of m classifiers- y = (y1, . . . , yn) = true labels of n instances
Common Approach:Look for ψi , ηi , y that maximize the likelihood.
Given assumption of independence of errors of different classifiers,for instance x ,
L(f1(x), . . . , fm(x) |y, ψi , ηi ) =m∏i=1
Pr[fi (x) |y, ψi , ηi ]
Assuming instances are i.i.d.
L(fi (xj) |y, ψi , ηi ) =n∏
j=1
L(fi (xj) | yj , ψi , ηi )
Boaz Nadler Unsupervised Ranking, Ensemble Learning 23
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Maximum Likelihood Solution
If we knew sensitivies and specificities of m classifiers
Lemma: MLE is a weighted linear ensemble classifier
y (ML) = sign
(∑i
fi (x) lnαi + lnβi
)
where
αi =ψiηi
(1− ψi )(1− ηi ), βi =
ψi (1− ψi )
ηi (1− ηi ).
Unfortunately:,i) depends on unknown classifier’s specificities and sensitivities !ii) Likelihood function not convex when ψi , ηi , y unknown
Boaz Nadler Unsupervised Ranking, Ensemble Learning 24
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Maximum Likelihood Solution
If we knew sensitivies and specificities of m classifiers
Lemma: MLE is a weighted linear ensemble classifier
y (ML) = sign
(∑i
fi (x) lnαi + lnβi
)
where
αi =ψiηi
(1− ψi )(1− ηi ), βi =
ψi (1− ψi )
ηi (1− ηi ).
Unfortunately:,i) depends on unknown classifier’s specificities and sensitivities !ii) Likelihood function not convex when ψi , ηi , y unknown
Boaz Nadler Unsupervised Ranking, Ensemble Learning 24
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Iterative EM-type Solutions
[Dawid and Skene, 1979]Expectation-Maximization:
- Given guess y of n labels, estimate ψi , ηi of m classifiers.
- Given sensitivity and specificity estimates, can constructapproximate MLE of labels, y.
Iteratively increase likelihood till convergence.
Boaz Nadler Unsupervised Ranking, Ensemble Learning 25
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Iterative EM-type Solutions
[Dawid and Skene, 1979]Expectation-Maximization:
- Given guess y of n labels, estimate ψi , ηi of m classifiers.
- Given sensitivity and specificity estimates, can constructapproximate MLE of labels, y.
Iteratively increase likelihood till convergence.
Boaz Nadler Unsupervised Ranking, Ensemble Learning 25
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Iterative EM-type Solutions
[Dawid and Skene, 1979]Expectation-Maximization:
- Given guess y of n labels, estimate ψi , ηi of m classifiers.
- Given sensitivity and specificity estimates, can constructapproximate MLE of labels, y.
Iteratively increase likelihood till convergence.
Boaz Nadler Unsupervised Ranking, Ensemble Learning 25
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
EM in practice
Widely used:
- Jin & Ghahramani, Learning with multiple labels, 2003.
- Raykar, Yu, Zhao et. al, Learning from crowds, 2010.
- Whitehill, Ruvolo, Wu et al. Whose vote should count more...,2009.
- Welinder, Branson, Belongie, Perona, Multidimensional wisdomof crowds, 2010.
- etc, etc.
Boaz Nadler Unsupervised Ranking, Ensemble Learning 26
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Iterative EM solution
Key limitations:
Convergence guaranteed only to local maxima
Need a good initial guess of y
No performance guarantees on solution
Question:Can we do better ?
Answer:Yes we can !
Simply plug-in into MLE our estimates ψi , ηi .
Boaz Nadler Unsupervised Ranking, Ensemble Learning 27
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Iterative EM solution
Key limitations:
Convergence guaranteed only to local maxima
Need a good initial guess of y
No performance guarantees on solution
Question:Can we do better ?
Answer:Yes we can !
Simply plug-in into MLE our estimates ψi , ηi .
Boaz Nadler Unsupervised Ranking, Ensemble Learning 27
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Iterative EM solution
Key limitations:
Convergence guaranteed only to local maxima
Need a good initial guess of y
No performance guarantees on solution
Question:Can we do better ?
Answer:Yes we can !
Simply plug-in into MLE our estimates ψi , ηi .
Boaz Nadler Unsupervised Ranking, Ensemble Learning 27
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Real Datasets
Novel fully-unsupervised ensemble learner.
SML = spectral meta-learner
Does method work on real data ?Indepedence assumptions not likely to hold exacty
Boaz Nadler Unsupervised Ranking, Ensemble Learning 28
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Real Datasets
Ensemble of classifiers:
33 standard classification algorithms as implemented in Weka(kNN, Decision Trees, SVM, Logistic regression, Naive Bayes, etc)
On Various real datasets: Split into training and test data.Each algorithm trained on separate sub-set of whole training data.
Consistency Checks:- Is λ1(R)/Trace(R) close to one (e.g. is matrix rank-one)- Do classifiers approximately make independent errors
Boaz Nadler Unsupervised Ranking, Ensemble Learning 29
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Real Datasets
Ensemble of classifiers:
33 standard classification algorithms as implemented in Weka(kNN, Decision Trees, SVM, Logistic regression, Naive Bayes, etc)
On Various real datasets: Split into training and test data.Each algorithm trained on separate sub-set of whole training data.
Consistency Checks:- Is λ1(R)/Trace(R) close to one (e.g. is matrix rank-one)- Do classifiers approximately make independent errors
Boaz Nadler Unsupervised Ranking, Ensemble Learning 29
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Real Datasets
Ensemble of classifiers:
33 standard classification algorithms as implemented in Weka(kNN, Decision Trees, SVM, Logistic regression, Naive Bayes, etc)
On Various real datasets: Split into training and test data.Each algorithm trained on separate sub-set of whole training data.
Consistency Checks:- Is λ1(R)/Trace(R) close to one (e.g. is matrix rank-one)- Do classifiers approximately make independent errors
Boaz Nadler Unsupervised Ranking, Ensemble Learning 29
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Example:
Financial Stock Prediction: (NYSE)
x= (opening, closing, low, high price, and volume) × 9 daysy = high price on 10th day
Prediction Goal:
highday 10 − highday 9
highday 9
> 1.05
Boaz Nadler Unsupervised Ranking, Ensemble Learning 30
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Example:
Financial Stock Prediction: (NYSE)
x= (opening, closing, low, high price, and volume) × 9 daysy = high price on 10th day
Prediction Goal:
highday 10 − highday 9
highday 9
> 1.05
Boaz Nadler Unsupervised Ranking, Ensemble Learning 30
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Examples:
Bal
ance
d ac
cura
cy (π
)
NYSE
0.45
0.50
0.55
0.60
SML iMLESML iMLEVoting
Best inferred predictor Ensemble median
Boaz Nadler Unsupervised Ranking, Ensemble Learning 31
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Summary
- Presented a spectral analysis of unsupervised ranking andensemble learning
- Key idea: Exploit structure of hidden low-rank matrix
Future work / Open Questions:- Regression and multi-class problems- Instance difficulty- Cartels- Real Applications- Relation to Random Matrix Theory- Low Rank Matrix Recovery with missing data
Parisi et al., PNAS, 2014.Jaffe, Nadler, Kluger, under review, 2014.www.wisdom.weizmann.ac.il/∼nadler/
Boaz Nadler Unsupervised Ranking, Ensemble Learning 32
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Summary
- Presented a spectral analysis of unsupervised ranking andensemble learning
- Key idea: Exploit structure of hidden low-rank matrix
Future work / Open Questions:- Regression and multi-class problems- Instance difficulty- Cartels- Real Applications- Relation to Random Matrix Theory- Low Rank Matrix Recovery with missing data
Parisi et al., PNAS, 2014.Jaffe, Nadler, Kluger, under review, 2014.www.wisdom.weizmann.ac.il/∼nadler/
Boaz Nadler Unsupervised Ranking, Ensemble Learning 32
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Summary
- Presented a spectral analysis of unsupervised ranking andensemble learning
- Key idea: Exploit structure of hidden low-rank matrix
Future work / Open Questions:- Regression and multi-class problems- Instance difficulty- Cartels- Real Applications- Relation to Random Matrix Theory- Low Rank Matrix Recovery with missing data
Parisi et al., PNAS, 2014.Jaffe, Nadler, Kluger, under review, 2014.www.wisdom.weizmann.ac.il/∼nadler/
Boaz Nadler Unsupervised Ranking, Ensemble Learning 32
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
On the paper by Dawid and Skene
Tech. report by Dawid in 1972.
Dawid and Skene, Maximum likelihood estimation of observer errorrates using the EM-algorithm, JRSS-C, 1979.
over 300 citations from google-scholar:
Thank You / The End
Boaz Nadler Unsupervised Ranking, Ensemble Learning 33
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
On the paper by Dawid and Skene
Tech. report by Dawid in 1972.
Dawid and Skene, Maximum likelihood estimation of observer errorrates using the EM-algorithm, JRSS-C, 1979.
over 300 citations from google-scholar:
Thank You / The End
Boaz Nadler Unsupervised Ranking, Ensemble Learning 33