Discovering Latent Domains for Multisource Domain Adaptationjudy/talks/Hoffman_WIML2012.pdfJ. Ho man, B. Kulis, T. Darrell, K. Saenko. \Discovering Latent Domains for Multisource Domain

$Page 1: Discovering Latent Domains for Multisource Domain Adaptationjudy/talks/Hoffman_WIML2012.pdfJ. Ho man, B. Kulis, T. Darrell, K. Saenko. \Discovering Latent Domains for Multisource Domain$
Discovering Latent Domainsfor Multisource Domain Adaptation

Judy Hoffman1,2, Brian Kulis3, Trevor Darrell1,2, Kate Saenko4

1UC Berkeley; 2ICSI; 3Ohio State University; 4University of Massachusetts, Lowell

Women in Machine Learning Workshop: December 3, 2012

Hoffman et. al. (UC Berkeley) Discovering Latent Domains WiML December 3, 2012 1 / 13

Background: Object Recognition

Training Images



Test Image: Correct!



Test Image: Incorrect


Background: Domain Adaptation

Domain adaptation helps bridge performance gap when test data isdrawn from different distribution (domain) than training data.

Previous Methods Include

Feature Transformation Techniques: Saenko (ECCV 2010), Kulis(CVPR 2011), Gopalan (ICCV 2011), Gong (CVPR 2012), . . .

Parameter Adaptation Techniques: Yang (ACM Multimedia 2007),Duan (CVPR 2009), Bergamo (NIPS 2010), Ayatar (ICCV 2011), . . .

All previous methods require datasets separated intohomogeneous domains
















Goal: Separate heterogeneous data into homogeneousdomains


Method 1/4: Separate by category label


Method 2/4: Cluster each category independently


Method 3/4: Constrained clustering algorithm


Method 4/4: Iterate steps (2-3) - Output domains


Hierarchical Gaussian Mixture Model

j = 1, . . . , J

µj

ZGj!G

m

i = 1, . . . , n

xi

ZLi !L

Figure 1: Graphical model for domain discovery.

p(zGj | !G) =

S!

k=1

(!Gj )zG

jk

p(µj | zGj , m) =

S!

k=1

N (µj | mk, "I)zGjk

p(xi | zLi , µ) =

J!

j=1

N (xi | µj , "I)zLij .

The joint probability is a product of the conditional probabilities in the model.Let ZL = {zL

i }, ZG = {zGj }, and X = [x1, ...,xn]. Then p(ZG, ZL, X | m, µ, !G, !L) =

n!

i=1

J!

j=1

S!

k=1

(!Gj )zG

jk(!Lk )zL

ij N (µj | mk, "I)zGjkN (xi | µj , "I)zL

ij .

We may think of this model in a generative manner as follows: for all localclusters j = 1, ..., J = S · K, we determine its underlying domain via the !G

mixing weights. Given this domain, we generate µj , the mean for the localcluster. Then, to generate the data points xi, we first determine which localcluster j the data point belongs to using !L, and then we generate the pointfrom the corresponding µj .

At this point, we still need to add the constraints discussed above, namelythat the local clusters only contain data points from a single category andthat domain clusters contain only a single local cluster from each category.Such constraints can be di!cult to enforce in a probabilistic model, but areconsiderably simpler in a hard clustering model. For instance, in semi-supervisedclustering there is considerable literature on adding constraints to the k-meansobjective via constraints or penalties (cite papers by Sugato Basu). Thus, wewill utilize a standard asymptotic argument on our probabilistic model to createa corresponding hard clustering model. In particular, if one takes a mixture ofGaussian model with fixed "I covariances across clusters, and lets " ! 0, theexpected log joint likelihood approaches the k-means objective, and the EM

2

x feature vector(with knownlabel y)

µ mean of localcluster

ZL assignments forlocal clusters

ZG assignments forglobal clusters

m mean of globalcluster


Optimization Formulation

minZG ,ZL,µ,m

n∑

i=1

J∑

j=1

ZLij(xi − µj)2 +

J∑

j=1

S∑

k=1

ZGjk(µj −mk)2

subject to: ∀j , k : ZGjk ∈ {0, 1}, ∀i , j : ZL

ij ∈ {0, 1}

∀j :S∑

k=1

ZGjk = 1, ∀i :

J∑

j=1

ZLij = 1

∀j :n∑

i=1

I(yi 6= label(j))ZLij = 0 category separated

∀k :K∑

c=1

J∑

j=1

I(label(j) = c)ZGjk = 1 do-not-link



minZG ,ZL,µ,m

n∑

i=1

J∑

j=1

ZLij(xi − µj)2 +

J∑

j=1

S∑

k=1

ZGjk(µj −mk)2


ij ∈ {0, 1}

∀j :S∑

k=1

ZGjk = 1, ∀i :

J∑

j=1

ZLij = 1

∀j :n∑

i=1


∀k :K∑

c=1

J∑

j=1




minZG ,ZL,µ,m

n∑

i=1

J∑

j=1

ZLij(xi − µj)2 +

J∑

j=1

S∑

k=1

ZGjk(µj −mk)2


ij ∈ {0, 1}

∀j :S∑

k=1

ZGjk = 1, ∀i :

J∑

j=1

ZLij = 1

∀j :n∑

i=1


∀k :K∑

c=1

J∑

j=1



Results

Office dataset with three known domains: amazon(a), webcam(w),dslr(d)

31 Categories, 10-20 images per category

(a,w) (a,d) (a,wB) (a,aR) (w,d) (w,wB) (w,aR) (d,wB) (d,aR) (wB,aR) Mean0.5

0.6

0.7

0.8

0.9

1

Domains

Clus

terin

g Ac

cura

cy (%

)

K meansConstrained Hard HDPOur Method


Results

Bing web search data set. Heterogeneous and weakly-labeled data.

30 Categories, 50 Images per Category, Set Number of Domains = 3

(a) “Cartoon Images” (b) “Cluttered Scenes” (c) “Product Images”


Results

Bing web search data set. Heterogeneous and weakly-labeled data.

30 Categories, 50 Images per Category, Set Number of Domains = 3

2.5 2 1.5 1 0.5 0 0.5 1 1.5 2 2.50.2

0.25

0.3

0.35

0.4

0.45

0.5Bing Web Search Caltech256

log( )

Obj

ect C

lass

ificat

ion

Accu

racy

KNN no Domain AdaptationARC t (Single Source)Our Method: Multi source with Latent Domains

J. Hoffman, B. Kulis, T. Darrell, K. Saenko. “Discovering Latent Domains for Multisource Domain Adaptation.” European

Conference in Computer Vision (ECCV), 2012.


Conclusion

Domain adaptation algorithms can bridge the performance gap whentrain and test data are drawn from different domains.

Current multisource adaptation methods require known domainlabels.

Using a constrained hierarchical gaussian mixture model we are ableto learn latent domain labels.

See our EECV paper for more details and our full multisource domainadaptation algorithm.

Thank you!


Conclusion





Thank you!


Conclusion





Thank you!


Conclusion





Thank you!


Conclusion





Thank you!


Documents

Discovering Latent Domains for Multisource Domain Adaptationjudy/talks/Hoffman_WIML2012.pdfJ. Ho man, B. Kulis, T. Darrell, K. Saenko. \Discovering Latent Domains for Multisource Domain