Adapted Deep Embeddings: A Synthesis of Methods for !-Shot
Inductive Transfer Learning
Tyler R. Scott1,2, Karl Ridgeway1,2, Michael C. Mozer1,3
1 University of Colorado, Boulder2 Sensory Inc.
3 Presently at Google Brain
Inductive Transfer Learning
ModelTarget Domain Input
Target Domain Prediction
SourceDomain
Data
Target Domain
Data
Weight Transfer
Retrain output
Adapt weightsto target domain
Inductive Transfer Learning
Yosinski et al. (2014)
Source Domain
…
Target Domain
…
Weight TransferDeep Metric Learning
Inductive Transfer Learning
Source Domain
…
Source & TargetDomain Embedding Histogram loss
(Ustinova & Lempitsky, 2016)
Distance
Within class
Betweenclass
Weight TransferDeep Metric LearningFew-Shot Learning
Inductive Transfer Learning
Source Domain
…
Source & TargetDomain Embedding
Prototypical nets(Snell et al., 2017)
Weight TransferDeep Metric LearningFew-Shot Learning
Inductive Transfer Learning
Adapted DeepEmbeddings
1. Train network usingembedding loss• Histogram loss,
Prototypical nets2. Adapt weights using
limited target-domaindata
Source Domain
…
Target Domain
…
Why hasn’t a comparison been explored?
Inductive Transfer Learning
# labeled examples per target class
(k)Weight Transfer > 100Deep Metric Learning agnosticFew-Shot Learning < 20
Target Domain
k labeled examples per
class
Source Domain
2200 labeled examples per
class
MNIST
1 5 10 50 100 500 1000k
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0Te
st A
ccur
acy
MNIST, n = 5
Baseline
Targ
et D
omai
nTe
st A
ccur
acy
Labeled Examples in Target Domain (k)
MNIST
1 5 10 50 100 500 1000k
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0Te
st A
ccur
acy
MNIST, n = 5
Weight AdaptationBaseline
Targ
et D
omai
nTe
st A
ccur
acy
Labeled Examples in Target Domain (k)
MNIST
1 5 10 50 100 500 1000k
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0Te
st A
ccur
acy
MNIST, n = 5
Prototypical NetWeight AdaptationBaseline
Targ
et D
omai
nTe
st A
ccur
acy
Labeled Examples in Target Domain (k)
MNIST
1 5 10 50 100 500 1000k
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0Te
st A
ccur
acy
MNIST, n = 5
Histogram LossPrototypical NetWeight AdaptationBaseline
Targ
et D
omai
nTe
st A
ccur
acy
Labeled Examples in Target Domain (k)
MNIST
1 5 10 50 100 500 1000k
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0Te
st A
ccur
acy
MNIST, n = 5
Adapted Histogram LossAdapted Prototypical NetHistogram LossPrototypical NetWeight AdaptationBaseline
Targ
et D
omai
nTe
st A
ccur
acy
Labeled Examples in Target Domain (k)
MNIST
1 10 50 100 200k
0.20.30.40.50.60.70.80.91.0
Isolet, n = 5
1 10 50 100 200k
0.20.30.40.50.60.70.80.91.0
Isolet, n = 10
5 10 100 1000n
0.0
0.1
0.2
0.3
0.4
0.5
0.6
Test
Acc
urac
y
Omniglot, k = 1
5 10 100 1000n
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9Omniglot, k = 5
5 10 100 1000n
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9Omniglot, k = 10
1 10 50 100 300k
0.2
0.3
0.4
0.5
0.6
0.7
Test
Acc
urac
ytinyImageNet, n = 5
1 10 50 100 300k
0.1
0.2
0.3
0.4
0.5
0.6tinyImageNet, n = 10
1 10 50 100 300k
0.0
0.1
0.2tinyImageNet, n = 50
1 10 50 100 200k
0.20.30.40.50.60.70.80.91.0
Isolet, n = 5
1 10 50 100 200k
0.20.30.40.50.60.70.80.91.0
Isolet, n = 10
1 5 10 50 100 500 1000k
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
Test
Acc
urac
y
MNIST, n = 5
Adapted Histogram LossAdapted Prototypical NetHistogram LossPrototypical NetWeight AdaptationBaseline
Conclusion
• Weight transfer is the least effective method for inductive transfer learning
• Histogram loss is robust regardless of the amount of labeled data in the target domain
• Adapted embeddings outperform every static embedding method previously proposed
Poster #167Room 210 & 230 AB
Today, 5:00 - 7:00 PM