Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
Active Learning and
Selective Sensing
Can’t Learn W
ithout You
Sensing
Computing
Sensing
Computing
“Closing the Loop”
Learning to Discover
Sequential approach:select new samples/experiments that are
predicted to be maximally inform
ative in discriminating hypotheses
select
sensing
action
sample
/sense
observe
/ infer
Laplace
Discovery !
Decided to make new astronomical
measurements when “the discrepancy
between prediction and observation [was]
large enough to give a high probability that
there is something new to be found.”
Jaynes(1986)
selective
sensing
observe
/ infer
Learning a decision hyperplanein
+-
Selective sampling yields exponential speed-up in learning !
Y. Freund, H. S. Seung, E. Shamir, and
N. Tishby. Selective sampling using the
query by committee algorithm. Machine
Learning, 28(2-3):133–168, 1997.
Now you see it, now you don’t !
Weak signals/patterns are imperceptible without selective sensing !
sparse
signal
noise
J. Haupt, R. Castro, and R. Nowak,
"Distilled sensing: selective sampling for
sparse signal recovery," in Proceedings of
AISTATS 2009, pp 216-223.
Outline
1. Active Learning: selective sampling for binary prediction problems
2. Distilled Sensing: selective sensing for sparse signal recovery
Common theme: feedback between data analysis and data
collection can be crucial for effective learning and inference
hypothesis
space
“Does the person
have blue eyes ?”
“Is the person
wearing a hat ?”
Binary Search
“Binary Search”works very well in simple conditions
Where is it shady vs. sunny ?
Binary Search and Threshold Functions
101
0x
y
Where is it shady vs. sunny ?
1/3 = 0…
1/3 = 01…
1/3 = 010…
1/3 = 0101…
Binary Search and Threshold Functions
101
0
0 * 1/2
1 * 1/40 * 3/8
1 *5/16
101
0*
**
**
**
**
**
**
**
**
11
11
0
11
00
00
00
00
00
active learning: sequentially select points for labeling
passive learning: all points are labeled
n samples �
n bits
n samples �
effectively log n bits
**
**
**
**
**
**
**
**
*x
y
x
y
Binary Search and Threshold Functions
101
0
0 * 1/2
1 * 1/40 * 3/8
1 *5/16
101
0*
**
**
**
**
**
**
**
**
11
11
0
11
00
00
00
00
00
Bounded and Unbounded Noise
“bounded noise”: strictly more/less probably 1 at all locations
more probably 0
more probably 1
“unbounded noise”: like the toss of a fair coin at threshold
Learning Rates for Multidimensional Thresholds
1
Compare with passive learning
Active Learning: Theorem (R. Castro and RN ’07)
query
space
hypothesis
space
oracle
Learner
consider
hypotheses
select sample/
query that is highly
discriminative
query oracle
elim
inate or discount
inconsistent hypotheses
"With every m
istake, we m
ust surely be learning." G. Harrison
Generalized Binary Search (akaSplitting Algorithm)
Selects a query for which disagreementamong
viable hypotheses is maximal
hypothesis
space
hypothesis
space
query
space
oracle
Example: Two-Dimensional Thresholds
-1
+1
+1
How well does GBS work ?
Geometry of GBS
Classic Binary Search is possible because the hypotheses are ordered with respect
to queries. W
e need a similar structure for more general hypothesis spaces.
To that end, note that the hypotheses induce a
partition of the query space into equivalence sets
A
A’
Geometric Condition for GBS Convergence
Classic Binary Search Revisited
0 0 0 0 0 0 0 0 1 1 1 1 1 1
-1 -1-1
-1-1
-1
unknown correct threshold at i*/n
1/2
1/2
Theorem 1 Proof Sketch:
Theorem 1 Proof Sketch:
‘good’situation:
‘bad’situation:
‘bad’situation:
x’
x+c
-c
-c +c
‘bad’situation:
Theorem 1 Proof Sketch:
‘good’situation:
Ex. Linear Thresholds in Two Dimensions
maximally inform
ative queries
Linear Thresholds in
KimiParker
Example
Suppose we have a sensor network observing a binary activation pattern with a
linear boundary. How many sensors must be queried to determ
ine the pattern?
number of hypotheses vs. queries
log number of hypotheses vs. queries
100 sensors, 9900 possible linear boundaries
Correct boundary determ
ined after querying 12 sensors
Conclusions
Minim
axBounds
selective sampling/querying can accelerate the learning of threshold functions
Generalized Binary Search
multidimensional threshold functions can be learned at the optimal rate
by selecting maximally discriminative queries
R. Castro and RN, “M
inimaxBounds for Active Learning,”
IEEE Trans. Info. Theory, pp. 2339–2353, 2008
RN, “Generalized Binary Search,”Proceedings of the Forty-
Sixth Annual AllertonConference on Communication, 2008
Detection/Estimation of Sparse Signals
How reliably can we determ
ine sparsitypatterns ?
Distilled Sensing
Detection/Estimation of Sparse Signals
fMRI
Astrophysics
Genomics
How reliably can we infer sparsepatterns ?
Sparse Signal Model
signal support set
Example:
In this talk we will assume .
Noisy Observation Model
Suppose we want to locate just onesignal component:
Because of noise, even if no signal is present
How small can µ
*be so that we can still reliably
locate the signal components from the observations?
Sparse Support Recovery
Adaptive control of therelative proportion of errors
(Benjamini& Hochberg ’95)
When testing a large number of hypotheses
simultaneously we are bound to m
ake errors
Approaches:
Control the probability of perfect localization of the support
(Bonferronicorrection) –very conservative
False Discovery Rate Control
Recall the definition of the signal support set
Goal:Estimate the support as well as possible. Let
be the outcome of a support estimation procedure.
# falsely discovered com
ponents
# discovered com
ponents
# missed com
ponents
# true components
False
Discovery
Proportion
Non D
iscovery
Proportion
Since nis typically very large it makes sense to
study asymptoticperform
ance, as n→∞
.
Known Results (Jin & Donoho’03)
Assume the signal is very sparse:
Example:β=3/4
n = 10000 ⇒
|Is|=10
n = 1000000 ⇒
|Is|=32
Number of signal
components
Known Results (Jin & Donoho’03)
Estimable
Signal
strength
Sparsity
Non-estimable
These asymptoticresults tell us how strong the
signals need to be for reliable signal localization
A Generalization of the Sensing Model
Allow m
ultiple observations…
…subject to a sampling energy budget
are called the sensing vectors.
(Note:in the previous work a single observation was
considered, where )
Distilled Sensing
Proceeding in this fashion, gradually focuson the signal subspace
Enhanced Sensitivity by Selectivity
Theorem 2
(J. Haupt, R. Castro and RN ‘08)
Furtherm
ore if one does not allow active sensing, then the previous
results (equivalent to k=0) cannot be improved.
original signal
(~0.8% non-zero components)
Noisy version of the
image (k=0)
Distilled sensing recovery
(FDR = 0.01)
Passive sensing recovery
(FDR = 0.01)
Noisy versions of the
image (k=5)
Distilled Sensing Example
Thresholds of Perceptibility
recoverypossible
from passive sensing
These results suggest we m
ight be able to estimate
signal with amplitudes growing slower than
Signal
strength
Sparsity
Universal Perceptibility
Theorem 3
(J. Haupt, R. Castro and RN ‘09)
Corollary
Proof Sketch (Theorem 2)
With high probability each distillation keeps almost all the non-zero
components and rejects about half of the non-signal components.
Energy in signal subspacedoublesat each step
Now you see it, now you don’t !
Weak signals/patterns are imperceptible without selective sensing !
sparse
signal
noise
J. Haupt, R. Castro and RN, “Distilled Sensing: Selective
Sampling for Sparse Signal Recovery,”AISTATS 2009