64
ECS 289G: Visual Recognition Winter 2018 Yong Jae Lee Department of Computer Science

ECS 289G: Visual Recognitionweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/research_overview.pdfOverview: Weakly-supervised visual recognition. 1. Learning object detectors

  • Upload
    others

  • View
    11

  • Download
    0

Embed Size (px)

Citation preview

Page 1: ECS 289G: Visual Recognitionweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/research_overview.pdfOverview: Weakly-supervised visual recognition. 1. Learning object detectors

ECS 289G: Visual RecognitionWinter 2018

Yong Jae LeeDepartment of Computer Science

Page 2: ECS 289G: Visual Recognitionweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/research_overview.pdfOverview: Weakly-supervised visual recognition. 1. Learning object detectors

Plan for today

• Questions?• Sign-up for paper• Research overview

Page 3: ECS 289G: Visual Recognitionweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/research_overview.pdfOverview: Weakly-supervised visual recognition. 1. Learning object detectors

Success in modern visual recognition research

Image classification

Object detectionPose recognition

Semantic segmentation

… and many more

Page 4: ECS 289G: Visual Recognitionweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/research_overview.pdfOverview: Weakly-supervised visual recognition. 1. Learning object detectors

Ingredients for success today1. Fast computation (GPUs)

2. Large labeled image dataset

3. Network architecture

Which ingredient will be the bottleneck fortomorrow’s success?

Page 5: ECS 289G: Visual Recognitionweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/research_overview.pdfOverview: Weakly-supervised visual recognition. 1. Learning object detectors

Ingredients for success today

2. Large labeled image dataset

3. Network architecture

1. Fast computation (GPUs)

Requires expensive, detailed human supervision

Page 6: ECS 289G: Visual Recognitionweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/research_overview.pdfOverview: Weakly-supervised visual recognition. 1. Learning object detectors

Take semantic segmentation as an example…

Requires pixel-level semantic labels

70,000+ annotation hours for 328K images but only 80 object categories (MS COCO)

Page 7: ECS 289G: Visual Recognitionweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/research_overview.pdfOverview: Weakly-supervised visual recognition. 1. Learning object detectors

Ingredients for success today

2. Large weakly-labeled image dataset

3. Network architecture

1. Fast computation (GPUs)

Page 8: ECS 289G: Visual Recognitionweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/research_overview.pdfOverview: Weakly-supervised visual recognition. 1. Learning object detectors

Fully-supervised vs. Weakly-supervised

{Person, umbrella, hat, … }

Fully-supervised:Pixel-level labels

Weakly-supervised:Image-level labels

Page 9: ECS 289G: Visual Recognitionweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/research_overview.pdfOverview: Weakly-supervised visual recognition. 1. Learning object detectors

Overview: Weakly-supervised visual recognition

1. Learning object detectors with tags

2. Learning visual attributes with pairwise comparisons

3. Learning visual grounding with captions

smile

“clock toweris tall”

plane

Page 10: ECS 289G: Visual Recognitionweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/research_overview.pdfOverview: Weakly-supervised visual recognition. 1. Learning object detectors

Weakly-supervised object detection

Annotators

Mine discriminative

patchescar

[Weber et al. 2000, Pandey & Lazebnik 2011, Deselaers et al. 2012, Song et al. 2014, …]

no car

Page 11: ECS 289G: Visual Recognitionweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/research_overview.pdfOverview: Weakly-supervised visual recognition. 1. Learning object detectors

Weakly-supervised object detection

Annotators

Mine discriminative

patchescar

[Weber et al. 2000, Pandey & Lazebnik 2011, Deselaers et al. 2012, Song et al. 2014, …]

no carWeakly-labeled training images

Page 12: ECS 289G: Visual Recognitionweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/research_overview.pdfOverview: Weakly-supervised visual recognition. 1. Learning object detectors

Weakly-supervised object detection

Annotators

Mine discriminative

patchescar

Weakly-labeled training images

[Weber et al. 2000, Pandey & Lazebnik 2011, Deselaers et al. 2012, Song et al. 2014, …]

no car

Page 13: ECS 289G: Visual Recognitionweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/research_overview.pdfOverview: Weakly-supervised visual recognition. 1. Learning object detectors

Weakly-supervised object detection

• Supervision is provided at the image-level scalable!

Annotators

Mine discriminative

patchescar

Weakly-labeled training images

[Weber et al. 2000, Pandey & Lazebnik 2011, Deselaers et al. 2012, Song et al. 2014, …]

no car

Page 14: ECS 289G: Visual Recognitionweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/research_overview.pdfOverview: Weakly-supervised visual recognition. 1. Learning object detectors

Weakly-supervised object detection

• Supervision is provided at the image-level scalable!• Due to intra-class appearance variations, occlusion, clutter, mined

regions correspond to object-part or include background

Annotators

Mine discriminative

patchescar

Weakly-labeled training images

[Weber et al. 2000, Pandey & Lazebnik 2011, Deselaers et al. 2012, Song et al. 2014, …]

no car

Page 15: ECS 289G: Visual Recognitionweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/research_overview.pdfOverview: Weakly-supervised visual recognition. 1. Learning object detectors

Our idea: Track and transferWeakly-labeled videos tagged with “car”

Weakly-labeled training images tagged with “car”

…[Singh, Xiao, Lee, “Track and Transfer: Watching Videos to Simulate Strong Human Supervision for Weakly-Supervised Object Detection”, CVPR 2016]

Page 16: ECS 289G: Visual Recognitionweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/research_overview.pdfOverview: Weakly-supervised visual recognition. 1. Learning object detectors

Transferring tracked object boundaries

Mined positive image region

Page 17: ECS 289G: Visual Recognitionweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/research_overview.pdfOverview: Weakly-supervised visual recognition. 1. Learning object detectors

Transferring tracked object boundaries

Mined positive image region

[Unsupervised video object proposals, Xiao and Lee, CVPR 2016]

• For each video frame, search over scales and locations

Page 18: ECS 289G: Visual Recognitionweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/research_overview.pdfOverview: Weakly-supervised visual recognition. 1. Learning object detectors

Transferring tracked object boundaries

match

Mined positive image region

• For each video frame, search over scales and locations

Page 19: ECS 289G: Visual Recognitionweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/research_overview.pdfOverview: Weakly-supervised visual recognition. 1. Learning object detectors

Transferring tracked object boundaries

match retrieve

Mined positive image region

• For each video frame, search over scales and locations

Page 20: ECS 289G: Visual Recognitionweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/research_overview.pdfOverview: Weakly-supervised visual recognition. 1. Learning object detectors

Transferring tracked object boundaries

Mined positive image region

• For each video frame, search over scales and locations

Page 21: ECS 289G: Visual Recognitionweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/research_overview.pdfOverview: Weakly-supervised visual recognition. 1. Learning object detectors

Transferring tracked object boundaries

match

Mined positive image region

• For each video frame, search over scales and locations

Page 22: ECS 289G: Visual Recognitionweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/research_overview.pdfOverview: Weakly-supervised visual recognition. 1. Learning object detectors

Transferring tracked object boundaries

matchretrieve

Mined positive image region

• For each video frame, search over scales and locations

Page 23: ECS 289G: Visual Recognitionweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/research_overview.pdfOverview: Weakly-supervised visual recognition. 1. Learning object detectors

Discovered pseudo ground-truth boxes

• Handles atypical pose, partial occlusion, and clutter

Page 24: ECS 289G: Visual Recognitionweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/research_overview.pdfOverview: Weakly-supervised visual recognition. 1. Learning object detectors

Weakly-supervised object detection accuracy

• Significantly outperformed existing weakly-supervised detection methods (at the time)

Page 25: ECS 289G: Visual Recognitionweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/research_overview.pdfOverview: Weakly-supervised visual recognition. 1. Learning object detectors

Overview: Weakly-supervised visual recognition

1. Learning object detectors with tags

2. Learning visual attributes with pairwise comparisons

3. Learning visual grounding with captions

smile

“clock toweris tall”

plane

Page 26: ECS 289G: Visual Recognitionweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/research_overview.pdfOverview: Weakly-supervised visual recognition. 1. Learning object detectors

Attribute: Smile

Goal of weakly-supervised attribute localization

• Localizing the relevant region improves attribute modeling• In the relative setting, we are provided with pairwise comparisons

Page 27: ECS 289G: Visual Recognitionweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/research_overview.pdfOverview: Weakly-supervised visual recognition. 1. Learning object detectors

Previous approaches• Require pre-trained part detectors or crowdsourcing [Kumar et al.

ICCV 2009, Bourdev et al. ICCV 2011, Duan et al. CVPR 2012, Zhang et al. CVPR 2014]

• “Pipeline” where features, localizer, and classifier are trained separately and sequentially; suboptimal and slow [Xiao and Lee, ICCV 2015]

• Our idea: jointly learn features, localizer, and classifier end-to-end in a deep learning framework

• We focus on the relative attribute setting

Page 28: ECS 289G: Visual Recognitionweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/research_overview.pdfOverview: Weakly-supervised visual recognition. 1. Learning object detectors

Overview of our end-to-end approach

[Singh and Lee, “End-to-End Localization and Ranking for Relative Attributes”, ECCV 2016]

V1

Loss Function

V2

Localization Network

Ranker Network

Siamese Network (S1)

Localization Network

Ranker Network

Siamese Network (S2)

I1

I2

Attribute: Smile

• Goal: Given pairs of ordered training images, simultaneously localize attribute in each image and learn a ranker

Page 29: ECS 289G: Visual Recognitionweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/research_overview.pdfOverview: Weakly-supervised visual recognition. 1. Learning object detectors

Our end-to-end approach

I

96256 384 384 384 128 3128

θ Grid generator

Ranker Network

V

96256 384 384 384 4096 4096

8192

1Localization Network

• Ranker network takes the localized region to produce a ranking score• Combine the global image for global context

Page 30: ECS 289G: Visual Recognitionweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/research_overview.pdfOverview: Weakly-supervised visual recognition. 1. Learning object detectors

Attribute:Smile

Attribute: Dark hair

Training epochs

• Heatmap: distribution of localized region across entire training dataset

Progression of localized region over training epochs

Page 31: ECS 289G: Visual Recognitionweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/research_overview.pdfOverview: Weakly-supervised visual recognition. 1. Learning object detectors

VtestLocalization

Network Ranker Network

Siamese (S1)

Testing

• Localize the relevant attribute region• Produce a ranking score for the test image

Test image

Page 32: ECS 289G: Visual Recognitionweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/research_overview.pdfOverview: Weakly-supervised visual recognition. 1. Learning object detectors

Results: Discovered regions and ranking on LFW-10 FacesWeak Strong

Bald

Darkhair

Eyesopen

• Our network discovers relevant attribute regions• Leads to accurate rankings

Smile

Page 33: ECS 289G: Visual Recognitionweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/research_overview.pdfOverview: Weakly-supervised visual recognition. 1. Learning object detectors

Weak Strong

Open

Pointy

Sporty

Comfort

Results: Discovered regions and ranking UT-Zap50K Shoes

Page 34: ECS 289G: Visual Recognitionweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/research_overview.pdfOverview: Weakly-supervised visual recognition. 1. Learning object detectors

Results: Image pair ranking accuracy

• % of test image pairs whose predicted relative attribute ranking is correct• State-of-the-art results on LFW-10, UT-Zap50K, OSR, Shoe-with-Attribute

Page 35: ECS 289G: Visual Recognitionweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/research_overview.pdfOverview: Weakly-supervised visual recognition. 1. Learning object detectors

Overview: Weakly-supervised visual recognition

1. Learning object detectors with tags

2. Learning visual attributes with pairwise comparisons

3. Learning visual grounding with captions

smile

“clock toweris tall”

plane

Page 36: ECS 289G: Visual Recognitionweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/research_overview.pdfOverview: Weakly-supervised visual recognition. 1. Learning object detectors

car

planecow

image + tag

Bounding box detections

Deselaers et al. 2010, Pandey & Lazebnik 2011, Song et al. 2014, Singh et al. 2016, ...

Localizingobjects

Tag-based weakly supervised localization

Page 37: ECS 289G: Visual Recognitionweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/research_overview.pdfOverview: Weakly-supervised visual recognition. 1. Learning object detectors

Xu et al. 2014, Pathak et al. 2015, Pinheiro & Collobert 2015, ...

Semantic segmentation

Tag-based weakly supervised localization

Page 38: ECS 289G: Visual Recognitionweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/research_overview.pdfOverview: Weakly-supervised visual recognition. 1. Learning object detectors

Tags vs. natural language

Page 39: ECS 289G: Visual Recognitionweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/research_overview.pdfOverview: Weakly-supervised visual recognition. 1. Learning object detectors

Tags vs. natural language

Tags: "cat", "hand", "donut"

Page 40: ECS 289G: Visual Recognitionweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/research_overview.pdfOverview: Weakly-supervised visual recognition. 1. Learning object detectors

Tags convey very limited amount of information about images

Tags vs. natural language

Tags: "cat", "hand", "donut"

Page 41: ECS 289G: Visual Recognitionweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/research_overview.pdfOverview: Weakly-supervised visual recognition. 1. Learning object detectors

Caption: "A grey cat staring at a hand with a donut."

Tags vs. natural language

Page 42: ECS 289G: Visual Recognitionweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/research_overview.pdfOverview: Weakly-supervised visual recognition. 1. Learning object detectors

Much more informative and natural to describe images with caption

Tags vs. natural language

Caption: "A grey cat staring at a hand with a donut."

Page 43: ECS 289G: Visual Recognitionweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/research_overview.pdfOverview: Weakly-supervised visual recognition. 1. Learning object detectors

A grey cat staring at a hand with a donut.

Our goal: Leverage structure in natural language forweakly supervised visual grounding

[Xiao, Sigal, Lee, “Weakly-supervised Visual Grounding of Phrases with Linguistic Structures”, CVPR 2017]

Page 44: ECS 289G: Visual Recognitionweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/research_overview.pdfOverview: Weakly-supervised visual recognition. 1. Learning object detectors

A grey cat staring at a hand with a donut.

Our goal: Leverage structure in natural language forweakly supervised visual grounding

[Xiao, Sigal, Lee, “Weakly-supervised Visual Grounding of Phrases with Linguistic Structures”, CVPR 2017]

Page 45: ECS 289G: Visual Recognitionweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/research_overview.pdfOverview: Weakly-supervised visual recognition. 1. Learning object detectors

Visual grounding of free-from language with only image-level supervision

A grey cat staring at a hand with a donut.

Our goal: Leverage structure in natural language forweakly supervised visual grounding

[Xiao, Sigal, Lee, “Weakly-supervised Visual Grounding of Phrases with Linguistic Structures”, CVPR 2017]

Page 46: ECS 289G: Visual Recognitionweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/research_overview.pdfOverview: Weakly-supervised visual recognition. 1. Learning object detectors

• Exploit linguistic structure in caption• Parse caption into syntactic parse tree [Socher et al. 2013]

A grey cat staring at a hand with a donut.

Our goal: Leverage structure in natural language forweakly supervised visual grounding

Page 47: ECS 289G: Visual Recognitionweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/research_overview.pdfOverview: Weakly-supervised visual recognition. 1. Learning object detectors

(1) Implies sibling exclusivity: Image regions for sibling nodes should be exclusive

Key idea: Transfer linguistic structure to visual domain

A grey cat staring at a hand with a donut.

Page 48: ECS 289G: Visual Recognitionweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/research_overview.pdfOverview: Weakly-supervised visual recognition. 1. Learning object detectors

(2) Implies parent-child inclusivity: Image region of parent node is union of children's regions

A grey cat staring at a hand with a donut.

Key idea: Transfer linguistic structure to visual domain

Page 49: ECS 289G: Visual Recognitionweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/research_overview.pdfOverview: Weakly-supervised visual recognition. 1. Learning object detectors

Four submodules: (1) visual encoder, (2) language encoder,(3) semantic embedding module, and (4) loss functions.

Network architecture

Page 50: ECS 289G: Visual Recognitionweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/research_overview.pdfOverview: Weakly-supervised visual recognition. 1. Learning object detectors

Phrase grounding on Visual Genome

Left: our modelRight: baseline trained without structural constraints (Discriminative loss only)

the train is on the bridge a person driving a boat a jockey riding a horse

Page 51: ECS 289G: Visual Recognitionweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/research_overview.pdfOverview: Weakly-supervised visual recognition. 1. Learning object detectors

Multiple queries on Visual Genomesnow covered

parkbench is black

and whiteclock tower

is tallbuildings by

streetthe walls are dark purple

yellow and orange cat

Our groundings generated for different phrases of the same image

Page 52: ECS 289G: Visual Recognitionweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/research_overview.pdfOverview: Weakly-supervised visual recognition. 1. Learning object detectors

Multiple queries on MS COCOpizza personbottle keyboardbus sheep

Our groundings generated for different tags of the same image

Page 53: ECS 289G: Visual Recognitionweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/research_overview.pdfOverview: Weakly-supervised visual recognition. 1. Learning object detectors

Phrase grounding on Visual Genome

Page 54: ECS 289G: Visual Recognitionweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/research_overview.pdfOverview: Weakly-supervised visual recognition. 1. Learning object detectors

Animal keypoint detection

Input Output

Page 55: ECS 289G: Visual Recognitionweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/research_overview.pdfOverview: Weakly-supervised visual recognition. 1. Learning object detectors

Motivation

• Veterinary research has found facial expressions of pain for horses, sheep, and cows to be a good indicator of pain

K. B. Gleerup, B. Forkman, C. Lindegaard, and P. H. Andersen. An equine pain face. Veterinary anesthesia and analgesia, 42(1):103–114, 2015

Page 56: ECS 289G: Visual Recognitionweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/research_overview.pdfOverview: Weakly-supervised visual recognition. 1. Learning object detectors

Motivation

• Automatic pain detection can have a huge impact on animal welfare– $0.5 billion spent annually in the US to treat

lameness in horses

• Human detection of animal pain is challenging– Humans need to be trained to detect pain– Livestock may hide expressions of pain near

humans

Page 57: ECS 289G: Visual Recognitionweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/research_overview.pdfOverview: Weakly-supervised visual recognition. 1. Learning object detectors

Key idea• Leverage existing human training data for

animal facial keypoint detection (with a small amount of animal training data)

• Warp animal faces to resemble human shape

Maheen Rashid, Xiuye Gu, Yong Jae Lee. Interspecies Knowledge Transfer. CVPR 2017.

Page 58: ECS 289G: Visual Recognitionweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/research_overview.pdfOverview: Weakly-supervised visual recognition. 1. Learning object detectors

Approach

• Warping network warps animal to have human-like shape

• Keypoint network predicts facial keypointson warped animal face

Page 59: ECS 289G: Visual Recognitionweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/research_overview.pdfOverview: Weakly-supervised visual recognition. 1. Learning object detectors

Experiments

• Horse Facial Keypoint Dataset – 3531 training, 186 testing images– Left eye, right eye, nose center, left mouth corner,

right mouth corner

• Sheep Facial Keypoint Dataset [Yang et al. 2016]

– 432 training, 99 testing Images– Same 5 keypoints as above

Page 60: ECS 289G: Visual Recognitionweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/research_overview.pdfOverview: Weakly-supervised visual recognition. 1. Learning object detectors

Qualitative resultsGround truth Our warp Our pred No warp

Page 61: ECS 289G: Visual Recognitionweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/research_overview.pdfOverview: Weakly-supervised visual recognition. 1. Learning object detectors

Qualitative results

Ours

Yang et al. 2016

• By leveraging existing human training data, our CNN model performs much better

Page 62: ECS 289G: Visual Recognitionweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/research_overview.pdfOverview: Weakly-supervised visual recognition. 1. Learning object detectors

Quantitative results

• Triplet Interpolated Features [Yang et al. 2016]

Horse Sheep

Page 63: ECS 289G: Visual Recognitionweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/research_overview.pdfOverview: Weakly-supervised visual recognition. 1. Learning object detectors

Conclusions

• Weakly-supervised learning enables scalable visual recognition- Object detection, attribute recognition, visual grounding

• Still, challenges remain in dealing with- Noise in data- Limited training data

Page 64: ECS 289G: Visual Recognitionweb.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/research_overview.pdfOverview: Weakly-supervised visual recognition. 1. Learning object detectors

Coming up• Sign-up for paper• Next class: CNN/PyTorch/Torch tutorial