67
Adaptation for Objects and Attributes Kristen Grauman Department of Computer Science University of Texas at Austin With Adriana Kovashka (UT Austin), Boqing Gong (USC), and Fei Sha (USC)

Adaptation for Objects and Attributesgrauman/slides/grauman-iccv2013-adaptation.pdfAdaptation for Objects and Attributes. Kristen Grauman. Department of Computer Science. ... Combining

  • Upload
    others

  • View
    14

  • Download
    0

Embed Size (px)

Citation preview

  • Adaptation for Objects and Attributes

    Kristen GraumanDepartment of Computer Science

    University of Texas at Austin

    With Adriana Kovashka (UT Austin), Boqing Gong (USC), and Fei Sha (USC)

  • Learning-based visual recognition

    Last 10+ years: impressive strides by learningappearance models (usually discriminative).

    Annotator

    Training images

    New imageCAR

    CAR

    NOT CAR

    Image features

    CAR!

  • Typical assumptions

    1. Test set will look like the training set.2. Human labelers “see” the same thing.

  • Mismatched domains

    TRAIN TESTFlickr YouTube

  • TRAIN TESTCatalog images Mobile phone photos

    Mismatched domains

  • TRAIN TESTImageNet PASCAL VOC

    Mismatched domains

  • TRAIN TESTImageNet PASCAL VOC

    “It is worthwile to note that, even with 140Ktraining ImageNet images, we do not perform as well aswith 5K PASCAL VOC training images.”

    – Perronnin et al. CVPR 2010

    Mismatched domains

  • Problem: Poor cross-domain generalization• Different underlying distributions

    • Overfit to datasets’ idiosyncrasies

    Possible solution: Unsupervised domain adaptation

    Mismatched domains

  • SetupSource domain (with labeled data)

    Target domain (no labels for training)

    ObjectiveLearn classifier to work well on the target

    Unsupervised domain adaptation

    Different distributions

  • Much recent research

    Correcting sampling bias

    [Shimodaira, ’00]

    [Huang et al., Bickel et al., ’07][Sugiyama et al., ’08]

    [Sethy et al., ’06]

    [Sethy et al., ’09][This work]

    Adjusting mismatched models

    [Evgeniou and Pontil, ’05]

    [Duan et al., ’09][Duan et al., Daumé III et al., Saenko et al., ’10]

    [Kulis et al., Chen et al., ’11]

    +-

    ---+

    +

    +

    ----++

    Inferring domain-invariant features

    [Pan et al., ’09]

    [Blitzer et al., ’06] [Gopalan et al., ’11][Chen et al., ’12][Daumé III, ’07]

    [Argyriou et al, ’08] [Gong et al., ’12][Muandet et al., ’13]

    +++-

    - +-+- +

  • Existing methods attempt to adapt allsource data points, including “hard” ones.

    Problem

    Source Target

  • Automatically identify the “most adaptable” instances

    Use them to create series of easier auxiliary domain adaptation tasks

    Our idea

    [Gong et al., ICML 2013]

    ProblemExisting methods attempt to adapt allsource data points, including “hard” ones.

  • Landmarks are labeled source instances distributed similarly to the targetdomain.

    Landmarks

    Source

    Target[Gong et al., ICML 2013]

  • Landmarks are labeled source instances distributed similarly to the targetdomain.

    Roles:Ease adaptation difficulty

    Provide discrimination (biased to target)

    Source

    Target

    Landmarks

    [Gong et al., ICML 2013]

  • Landmarks

    Target

    Source

    1 Identify landmarksat multiple scales.

    Key steps

    Coarse

    Fine-grained

    [Gong et al., ICML 2013]

  • 2 Construct auxiliary domain adaptation tasks

    3

    Obtain domain-invariant features

    4

    Predict target labels

    Key steps

    [Gong et al., ICML 2013]

  • Objective

    Identifying landmarks

    Source

    Target[Gong et al., ICML 2013]

  • Maximum mean discrepancy (MMD)

    Empirical estimate [Gretton et al. ’06]

    a universal RKHS

    kernel function induced by

    the l-th landmark (from the source domain)

    [Gong et al., ICML 2013]

  • Integer programming

    where

    Method for identifying landmarks

    [Gong et al., ICML 2013]

  • Convex relaxation

    Method for identifying landmarks

    [Gong et al., ICML 2013]

  • Gaussian kernelsHow to choose the bandwidth?

    Our solution:Examine distributions at multiple granularities

    Multiple bandwidthsmultiple sets of landmarks

    Scale for landmark similarity?

    [Gong et al., ICML 2013]

  • Landmarks at multiple scales

    22

    Headphone Mug

    target

    TargetS

    ource

    6σ=20σ=2

    -3σ=2

    Unselected

    [Gong et al., ICML 2013]

  • 2 Construct auxiliary domain adaptation tasks

    Key steps

    [Gong et al., ICML 2013]

  • Constructing easier auxiliary tasks

    Target

    Source

    Landmarks

    At each scale σ

    Intuition: distributions are closer (cf. Theorem 1)

    [Gong et al., ICML 2013]

  • At each scale σ

    Intuition: distributions are closer (cf. Theorem 1)

    New target

    New source

    Landmarks

    Constructing easier auxiliary tasks

    [Gong et al., ICML 2013]

  • - Integrate out domain changes-Obtain domain-invariant representation [Gong, et al. ’12]

    Each task provides new basis of features via geodesic flow kernel (GFK):

    Constructing easier auxiliary tasks

    [Gong et al., CVPR 2012]

  • 2 Construct auxiliary domain adaptation tasks

    3

    Obtain domain-invariant features

    MKL

    Key steps

  • Multiple kernel learning on the labeled landmarks

    Arriving at domain-invariant feature space

    Discriminative loss biased to the target

    Combining features discriminatively

  • 2 Construct auxiliary domain adaptation tasks

    3

    Obtain domain-invariant features

    4

    Predict target labels

    Key steps

  • Four vision datasets/domains on visual object recognition[Griffin et al. ’07, Saenko et al. 10’]

    Four types of product reviews on sentiment analysisBooks, DVD, electronics, kitchen appliances [Biltzer et al. ’07]

    Experiments

  • Cross-dataset object recognition

  • Cross-dataset object recognition

  • Cross-dataset object recognition

  • Datasets as domains?

    Domain 1 Domain 2

    Domain 3

    Domain 4Domain 5

    ASSUMED

  • Datasets as domains?

    Domain 1 Domain 2

    Domain 3

    Domain 4Domain 5

    Domain 5Domain 6

    Domain 7Domain 8

    Domain 9

    Domain 10

    REALITY

  • Datasets as domains?

    Domain 1 Domain 2

    Domain 3

    Domain 4Domain 5

    Domain 5Domain 6

    Domain 7Domain 8

    Domain 9

    Domain 10

    Dataset != DomainCross-dataset adaptation is suboptimal

    REALITY

  • NLP: Language-specific domains

    Speech: Speaker-specific domains

    Vision: ??pose-specific? illumination-specific? occlusion? image resolution? background?

    Challenges:Many continuous factors vs. few discreteFactors overlap and interact

    How to define a domain?

  • Discovering latent visual domains

    Maximum distinctiveness

    Maximum learnabilityDetermine K with domain-wise cross-validation

    MMD

    where

    [Gong et al., NIPS 2013]

    We propose to discover domains – “reshaping” them to cross dataset boundaries

  • Discovered domain I

    Discovered domain II

    Results: discovering domains

    [Gong et al., NIPS 2013]

  • 33

    34

    35

    36

    37

    38

    39

    40

    41

    42

    42

    43

    44

    45

    46

    47

    48

    49

    50

    Domains=datasets

    Hoffman et al. 2012

    Discovered domains (ours)

    Cross-datasetobject recognition

    Cross-viewpointaction recognition

    Domain I Domain II

    Domains=datasets

    Hoffman et al. 2012

    Discovered domains (ours)

    Results: discovering domainsAc

    cura

    cy

    Accu

    racy

  • Summary so far

    landmarkslabeled source instances distributed similarly to the target

    auxiliary tasks provably easier to solve

    discriminative loss despite unlabeled target

    reshaping datasets to latent domains

    discover cross-dataset domains

    maximally distinct & learnable

  • Typical assumptions

    1. Test set will look like the training set.2. Human labelers “see” the same thing.

  • Visual attributes

    • High-level semantic properties shared by objects• Human-understandable and machine-detectable

    brown

    indoors

    outdoors flat

    four-legged

    high heel

    redhas-ornaments

    metallic

    [Oliva et al. 2001, Ferrari & Zisserman 2007, Kumar et al. 2008, Farhadi et al. 2009, Lampert et al. 2009, Endres et al. 2010, Wang & Mori 2010, Berg et al. 2010, Branson et al. 2010, Parikh & Grauman 2011, …]

  • Standard approach

    Learn one monolithic model per attribute

    Vote on labels

    “formal”

    “not formal”

    Annotator A

    Annotator B Annotator C

  • Problem

    Formal? User labels: 50% “yes”50% “no” or

    More ornamented? User labels: 50% “first”20% “second”30% “equally”

    There may be valid perceptual differences within an attribute.

    Binary attribute Relative attribute

  • Overweight?or just

    Chubby?

    Fine-grained meaning

    Imprecision of attributes

  • Is formal?

    = formal wear for a conference? OR

    = formal wear for a wedding?

    Context

    Imprecision of attributes

  • Is blue or green?

    English: “blue”

    Russian: “neither” (“голубой” vs. “синий”)

    Japanese: “both”(“青” = blue and green)

    Cultural

    Imprecision of attributes

  • But do we need to be that precise?

    Yes. Applications like image search require that user’s perception matches system’s predictions.

    [WhittleSearch, Kovashka et al. CVPR 2012]

    “less formal than these”

    “white high heels”

  • Our idea

    [Kovashka and Grauman, ICCV 2013]

    • Treat learning perceived attributes as an adaptation problem.

    • Adapt generic attribute model with minimal user-specific labeled examples.

    • Obtain implicit user-specific labels from user’s search history

  • Vote on labels

    “formal”

    “not formal”

    “formal”

    “not formal”

    “formal”

    “not formal”

    [Kovashka and Grauman, ICCV 2013]

    Our idea

  • • Adapting binary attribute classifiers:

    Learning adapted attributes

    J. Yang et al. ICDM 2007.

    Given user-labeled data

    and generic model ,

  • • Adapting relative attribute rankers:

    Learning adapted attributes

    Given user-labeled data

    and generic model ,

    B. Geng, et al. TKDE 2010.

  • Collecting user-specific labels

    • Explicitly from actively requested labelsSeek labels on uncertain and diverse images

    • Implicitly from search historyoTransitivity

    oContradictions

    “My target is…

    less formal than

    more formal than “

    implies

  • Inferring implicit labels

    more sporty

    … … … …

    “Target is more sporty than B”

    “Target is less sporty than A”

    less sporty

    … … … …

    A

    B

    User’s feedback history can reveal mismatch in perceived and predicted attributes

  • User’s feedback history can reveal mismatch in perceived and predicted attributes

    more sporty

    … … … …

    “Target is more sporty than B”

    A C

    B

    more feminine (~ less sporty)

    … … … …

    “Target is more feminine than A”

    Inferring implicit labels

  • SUN Attributes:14,340 scene images

    12 attributes:“sailing”, “hiking”,

    “vacationing”, “open area”, “vegetation”, etc.

    Datasets

    Shoes:14,658 shoe images;

    10 attributes: “pointy”, “bright”, “high-heeled”, “feminine” etc.

    57

  • Adapted attribute accuracy

    • 3 datasets• 22 attributes• 75 total users

  • Adapted attribute accuracy

    • 3 datasets• 22 attributes• 75 total users

  • Adapted attribute accuracy

    • 3 datasets• 22 attributes• 75 total users

  • Adaptation approach most accurately captures perceived attributes

    Adapted attribute accuracy

    [Kovashka and Grauman, ICCV 2013]

  • Which images most influence adaptation?

    pointy open bright ornamented shiny high-heeled

    long formal sporty feminine

    sailing vacationing hiking camping socializing shopping

    vegetation clouds natural light cold open area horizon far62

  • SUN – Binary Attributes – “Vacationing”

    Shoes – Relative Attributes – “Formal”

    gene

    ricad

    apte

    d less more

    gene

    ricad

    apte

    d

    less more

    Visualizing adapted attributes

  • 0

    10

    20

    30

    40

    50

    60

    70

    Shoes-Binary SUN

    generic generic+ user-exclusive user-adaptive

    Mat

    ch ra

    tePersonalizing image search

    with adapted attributes“white shiny heels”

    “shinier than ”

  • 71.5

    72

    72.5

    73

    73.5

    74

    74.5

    Shoes-Relative

    explicit labels only+contradictions+transitivity

    Perc

    entil

    e ra

    nk

    Impact of implicit labels

  • Summary

    • Practical concerns if learning visual categories:Test images can look different from training images!People do not perceive image labels universally!

    • Domain adaptation methods help address themLandmark-based unsupervised adaptationReshaping datasets into latent domainsAdapt generic models to account for user-specific perception of attributes

  • References• Attribute Adaptation for Personalized Image Search. A. Kovashka

    and K. Grauman. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Sydney, Australia, December 2013.

    • Reshaping Visual Datasets for Domain Adaptation. B. Gong, K. Grauman, and F. Sha. In Proceedings of Advances in Neural Information Processing Systems (NIPS), Tahoe, Nevada, December 2013.

    • Connecting the Dots with Landmarks: Discriminatively Learning Domain-Invariant Features for Unsupervised Domain Adaptation. B. Gong, K. Grauman, and F. Sha. In International Conference on Machine Learning (ICML), Atlanta, GA, June 2013.

    • Geodesic Flow Kernel for Unsupervised Domain Adaptation. B. Gong, Y. Shi, F. Sha, and K. Grauman. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Providence, RI, June 2012.

    Adaptation for Objects and AttributesLearning-based visual recognitionTypical assumptionsMismatched domainsMismatched domainsMismatched domainsMismatched domainsMismatched domainsUnsupervised domain adaptationMuch recent researchProblemProblemLandmarksLandmarksSlide Number 15Slide Number 16Identifying landmarksMaximum mean discrepancy (MMD)Method for identifying landmarksMethod for identifying landmarksScale for landmark similarity?Landmarks at multiple scalesSlide Number 23Constructing easier auxiliary tasksConstructing easier auxiliary tasksConstructing easier auxiliary tasksSlide Number 27Combining features discriminativelySlide Number 29ExperimentsCross-dataset object recognitionCross-dataset object recognitionCross-dataset object recognitionDatasets as domains?Datasets as domains?Datasets as domains?How to define a domain?Discovering latent visual domainsResults: discovering domainsResults: discovering domainsSummary so farTypical assumptionsVisual attributesStandard approachProblemSlide Number 46Slide Number 47Slide Number 48But do we need to be that precise?Our ideaOur ideaLearning adapted attributesLearning adapted attributesCollecting user-specific labelsInferring implicit labelsInferring implicit labelsDatasetsAdapted attribute accuracyAdapted attribute accuracyAdapted attribute accuracyAdapted attribute accuracyWhich images most influence adaptation?Slide Number 63Slide Number 64Slide Number 65SummaryReferences