Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Discovering Latent Domainsfor Multisource Domain Adaptation
Judy Hoffman1,2, Brian Kulis3, Trevor Darrell1,2, Kate Saenko4
1UC Berkeley; 2ICSI; 3Ohio State University; 4University of Massachusetts, Lowell
Women in Machine Learning Workshop: December 3, 2012
Hoffman et. al. (UC Berkeley) Discovering Latent Domains WiML December 3, 2012 1 / 13
Background: Object Recognition
Training Images
Hoffman et. al. (UC Berkeley) Discovering Latent Domains WiML December 3, 2012 2 / 13
Background: Object Recognition
Test Image: Correct!
Hoffman et. al. (UC Berkeley) Discovering Latent Domains WiML December 3, 2012 2 / 13
Background: Object Recognition
Test Image: Incorrect
Hoffman et. al. (UC Berkeley) Discovering Latent Domains WiML December 3, 2012 2 / 13
Background: Domain Adaptation
Domain adaptation helps bridge performance gap when test data isdrawn from different distribution (domain) than training data.
Previous Methods Include
Feature Transformation Techniques: Saenko (ECCV 2010), Kulis(CVPR 2011), Gopalan (ICCV 2011), Gong (CVPR 2012), . . .
Parameter Adaptation Techniques: Yang (ACM Multimedia 2007),Duan (CVPR 2009), Bergamo (NIPS 2010), Ayatar (ICCV 2011), . . .
All previous methods require datasets separated intohomogeneous domains
Hoffman et. al. (UC Berkeley) Discovering Latent Domains WiML December 3, 2012 3 / 13
Background: Domain Adaptation
Domain adaptation helps bridge performance gap when test data isdrawn from different distribution (domain) than training data.
Previous Methods Include
Feature Transformation Techniques: Saenko (ECCV 2010), Kulis(CVPR 2011), Gopalan (ICCV 2011), Gong (CVPR 2012), . . .
Parameter Adaptation Techniques: Yang (ACM Multimedia 2007),Duan (CVPR 2009), Bergamo (NIPS 2010), Ayatar (ICCV 2011), . . .
All previous methods require datasets separated intohomogeneous domains
Hoffman et. al. (UC Berkeley) Discovering Latent Domains WiML December 3, 2012 3 / 13
Background: Domain Adaptation
Domain adaptation helps bridge performance gap when test data isdrawn from different distribution (domain) than training data.
Previous Methods Include
Feature Transformation Techniques: Saenko (ECCV 2010), Kulis(CVPR 2011), Gopalan (ICCV 2011), Gong (CVPR 2012), . . .
Parameter Adaptation Techniques: Yang (ACM Multimedia 2007),Duan (CVPR 2009), Bergamo (NIPS 2010), Ayatar (ICCV 2011), . . .
All previous methods require datasets separated intohomogeneous domains
Hoffman et. al. (UC Berkeley) Discovering Latent Domains WiML December 3, 2012 3 / 13
Goal: Separate heterogeneous data into homogeneousdomains
Hoffman et. al. (UC Berkeley) Discovering Latent Domains WiML December 3, 2012 4 / 13
Method 1/4: Separate by category label
Hoffman et. al. (UC Berkeley) Discovering Latent Domains WiML December 3, 2012 5 / 13
Method 2/4: Cluster each category independently
Hoffman et. al. (UC Berkeley) Discovering Latent Domains WiML December 3, 2012 6 / 13
Method 3/4: Constrained clustering algorithm
Hoffman et. al. (UC Berkeley) Discovering Latent Domains WiML December 3, 2012 7 / 13
Method 4/4: Iterate steps (2-3) - Output domains
Hoffman et. al. (UC Berkeley) Discovering Latent Domains WiML December 3, 2012 8 / 13
Hierarchical Gaussian Mixture Model
j = 1, . . . , J
µj
ZGj!G
m
i = 1, . . . , n
xi
ZLi !L
Figure 1: Graphical model for domain discovery.
p(zGj | !G) =
S!
k=1
(!Gj )zG
jk
p(µj | zGj , m) =
S!
k=1
N (µj | mk, "I)zGjk
p(xi | zLi , µ) =
J!
j=1
N (xi | µj , "I)zLij .
The joint probability is a product of the conditional probabilities in the model.Let ZL = {zL
i }, ZG = {zGj }, and X = [x1, ...,xn]. Then p(ZG, ZL, X | m, µ, !G, !L) =
n!
i=1
J!
j=1
S!
k=1
(!Gj )zG
jk(!Lk )zL
ij N (µj | mk, "I)zGjkN (xi | µj , "I)zL
ij .
We may think of this model in a generative manner as follows: for all localclusters j = 1, ..., J = S · K, we determine its underlying domain via the !G
mixing weights. Given this domain, we generate µj , the mean for the localcluster. Then, to generate the data points xi, we first determine which localcluster j the data point belongs to using !L, and then we generate the pointfrom the corresponding µj .
At this point, we still need to add the constraints discussed above, namelythat the local clusters only contain data points from a single category andthat domain clusters contain only a single local cluster from each category.Such constraints can be di!cult to enforce in a probabilistic model, but areconsiderably simpler in a hard clustering model. For instance, in semi-supervisedclustering there is considerable literature on adding constraints to the k-meansobjective via constraints or penalties (cite papers by Sugato Basu). Thus, wewill utilize a standard asymptotic argument on our probabilistic model to createa corresponding hard clustering model. In particular, if one takes a mixture ofGaussian model with fixed "I covariances across clusters, and lets " ! 0, theexpected log joint likelihood approaches the k-means objective, and the EM
2
x feature vector(with knownlabel y)
µ mean of localcluster
ZL assignments forlocal clusters
ZG assignments forglobal clusters
m mean of globalcluster
Hoffman et. al. (UC Berkeley) Discovering Latent Domains WiML December 3, 2012 9 / 13
Optimization Formulation
minZG ,ZL,µ,m
n∑
i=1
J∑
j=1
ZLij(xi − µj)2 +
J∑
j=1
S∑
k=1
ZGjk(µj −mk)2
subject to: ∀j , k : ZGjk ∈ {0, 1}, ∀i , j : ZL
ij ∈ {0, 1}
∀j :S∑
k=1
ZGjk = 1, ∀i :
J∑
j=1
ZLij = 1
∀j :n∑
i=1
I(yi 6= label(j))ZLij = 0 category separated
∀k :K∑
c=1
J∑
j=1
I(label(j) = c)ZGjk = 1 do-not-link
Hoffman et. al. (UC Berkeley) Discovering Latent Domains WiML December 3, 2012 10 / 13
Optimization Formulation
minZG ,ZL,µ,m
n∑
i=1
J∑
j=1
ZLij(xi − µj)2 +
J∑
j=1
S∑
k=1
ZGjk(µj −mk)2
subject to: ∀j , k : ZGjk ∈ {0, 1}, ∀i , j : ZL
ij ∈ {0, 1}
∀j :S∑
k=1
ZGjk = 1, ∀i :
J∑
j=1
ZLij = 1
∀j :n∑
i=1
I(yi 6= label(j))ZLij = 0 category separated
∀k :K∑
c=1
J∑
j=1
I(label(j) = c)ZGjk = 1 do-not-link
Hoffman et. al. (UC Berkeley) Discovering Latent Domains WiML December 3, 2012 10 / 13
Optimization Formulation
minZG ,ZL,µ,m
n∑
i=1
J∑
j=1
ZLij(xi − µj)2 +
J∑
j=1
S∑
k=1
ZGjk(µj −mk)2
subject to: ∀j , k : ZGjk ∈ {0, 1}, ∀i , j : ZL
ij ∈ {0, 1}
∀j :S∑
k=1
ZGjk = 1, ∀i :
J∑
j=1
ZLij = 1
∀j :n∑
i=1
I(yi 6= label(j))ZLij = 0 category separated
∀k :K∑
c=1
J∑
j=1
I(label(j) = c)ZGjk = 1 do-not-link
Hoffman et. al. (UC Berkeley) Discovering Latent Domains WiML December 3, 2012 10 / 13
Results
Office dataset with three known domains: amazon(a), webcam(w),dslr(d)
31 Categories, 10-20 images per category
(a,w) (a,d) (a,wB) (a,aR) (w,d) (w,wB) (w,aR) (d,wB) (d,aR) (wB,aR) Mean0.5
0.6
0.7
0.8
0.9
1
Domains
Clus
terin
g Ac
cura
cy (%
)
K meansConstrained Hard HDPOur Method
Hoffman et. al. (UC Berkeley) Discovering Latent Domains WiML December 3, 2012 11 / 13
Results
Bing web search data set. Heterogeneous and weakly-labeled data.
30 Categories, 50 Images per Category, Set Number of Domains = 3
(a) “Cartoon Images” (b) “Cluttered Scenes” (c) “Product Images”
Hoffman et. al. (UC Berkeley) Discovering Latent Domains WiML December 3, 2012 12 / 13
Results
Bing web search data set. Heterogeneous and weakly-labeled data.
30 Categories, 50 Images per Category, Set Number of Domains = 3
2.5 2 1.5 1 0.5 0 0.5 1 1.5 2 2.50.2
0.25
0.3
0.35
0.4
0.45
0.5Bing Web Search Caltech256
log( )
Obj
ect C
lass
ificat
ion
Accu
racy
KNN no Domain AdaptationARC t (Single Source)Our Method: Multi source with Latent Domains
J. Hoffman, B. Kulis, T. Darrell, K. Saenko. “Discovering Latent Domains for Multisource Domain Adaptation.” European
Conference in Computer Vision (ECCV), 2012.
Hoffman et. al. (UC Berkeley) Discovering Latent Domains WiML December 3, 2012 12 / 13
Conclusion
Domain adaptation algorithms can bridge the performance gap whentrain and test data are drawn from different domains.
Current multisource adaptation methods require known domainlabels.
Using a constrained hierarchical gaussian mixture model we are ableto learn latent domain labels.
See our EECV paper for more details and our full multisource domainadaptation algorithm.
Thank you!
Hoffman et. al. (UC Berkeley) Discovering Latent Domains WiML December 3, 2012 13 / 13
Conclusion
Domain adaptation algorithms can bridge the performance gap whentrain and test data are drawn from different domains.
Current multisource adaptation methods require known domainlabels.
Using a constrained hierarchical gaussian mixture model we are ableto learn latent domain labels.
See our EECV paper for more details and our full multisource domainadaptation algorithm.
Thank you!
Hoffman et. al. (UC Berkeley) Discovering Latent Domains WiML December 3, 2012 13 / 13
Conclusion
Domain adaptation algorithms can bridge the performance gap whentrain and test data are drawn from different domains.
Current multisource adaptation methods require known domainlabels.
Using a constrained hierarchical gaussian mixture model we are ableto learn latent domain labels.
See our EECV paper for more details and our full multisource domainadaptation algorithm.
Thank you!
Hoffman et. al. (UC Berkeley) Discovering Latent Domains WiML December 3, 2012 13 / 13
Conclusion
Domain adaptation algorithms can bridge the performance gap whentrain and test data are drawn from different domains.
Current multisource adaptation methods require known domainlabels.
Using a constrained hierarchical gaussian mixture model we are ableto learn latent domain labels.
See our EECV paper for more details and our full multisource domainadaptation algorithm.
Thank you!
Hoffman et. al. (UC Berkeley) Discovering Latent Domains WiML December 3, 2012 13 / 13
Conclusion
Domain adaptation algorithms can bridge the performance gap whentrain and test data are drawn from different domains.
Current multisource adaptation methods require known domainlabels.
Using a constrained hierarchical gaussian mixture model we are ableto learn latent domain labels.
See our EECV paper for more details and our full multisource domainadaptation algorithm.
Thank you!
Hoffman et. al. (UC Berkeley) Discovering Latent Domains WiML December 3, 2012 13 / 13