Where should saliency models look next ? (UPC Reading Group)

WHERE SHOULD SALIENCY MODELS LOOK NEXT ?

Zoya Bylinskii, Adrià Recasens, Ali Borji, Aude Oliva, Antonio Torralba, Frédo Durand “Where Should Saliency Models Look Next?”, ECCV2016

https://www.semanticscholar.org/author/Zoya-Bylinskii/3326347

https://www.semanticscholar.org/author/Adri%C3%A0-Recasens/1814982

https://www.semanticscholar.org/author/Ali-Borji/3177797

https://www.semanticscholar.org/author/Aude-Oliva/2392359

https://www.semanticscholar.org/author/Antonio-Torralba/1690178

https://www.semanticscholar.org/author/Fr%C3%A9do-Durand/1728125

https://www.semanticscholar.org/search?venue%5B%5D=ECCV&q=ECCV&sort=relevance&ae=false

Hello!I am Junting PanI am here because I love to give presentations. You can find me at [email protected]

Saliency mapSaliency map is a probability distribution map, that describe where human observers look in images.

It can provide important clues to human image understanding :- Main focus- Action or event- Participants

Regions of interest to human

1.Motivation

Let’s start with the first set of slides

Breakthroughs because of ….

× Prediction score increment has benn stable for looong time since ..

× CNN comes !!× End to end manner : feat. extraction, feat.

integration & saliency prediction.

Breakthroughs because of ….

× Prediction score increment has benn stable for looong time since ..

× CNN comes !!× End to end manner : feat. extraction, feat.

integration & saliency prediction.

Evaluation scores h

ave begun

to saturate

“Have saliency models begun to converge on human

performance and is saliency a solved problem ?”

HIGHER LEVEL

- Text- Objects of gaze and action- Locations of motion- People in image

A picture is worth a thousand words

A complex idea can be conveyed with just a single still image, namely making it possible to absorb large amounts of data quickly.

2.RELATED WORKS

AndEVALUATION

SALICON modelCNN applied at 2 different image scales : small & BIG

BEST MODELS AT mit benchmarkDeepFixFCN built on top of the VGG.

10 MOST REPRESENTATIVE images

0,97 of Spearman correlation relative to their ranking on all dataset images

3.QUANTIFYING WHERE

people & MODELS LOOK IN IMAGES

Name all image regions under the fixation map

95 percentile threshold

651 regions over 300 images

20 users and 2 MTurk task

WHAT CAN MODELS GAIN?

Gains that model could have if specific region were correctly predicted

4.The importance of

people

77%Correctly prediction (DeepFix)

Face saliency is underestimated when faces are small, non-frontal, or not centered in an image

Sometimes the actions in a scene are more salient to human observers than the participants, but saliency models can overestimate the relative saliency of the faces

Not all people are equally important

× Assign importance score to each face (using fixation gt and predicted map.

× Relative ordering assign by saliency model does not match by the importance given by human fixations.

Name all image regions under the fixation map

GrayWhite Black

4.The informativeness

of text

Understanding the text ...

× The description of a warning or a book are more informative to observers than the warning or book title..

× Only piece of English text..

4.Object of gaze and

action

Objects and action

× Objects of gaze and/or action are usually missed

× Detecting objects of action remains a problem area for saliency model..

4.Conclusion

Let’s finish with one slide

Conclusion

Models continue to under-predict crucial image regions containing people, actions, and text.

These are the regions with greatest semantic importance in an image, and become essential for saliency applications

“Have saliency models begun to converge on human

performance and is saliency a solved problem ?”

“NoT YET!”

THANKS!Any questions?Please contact with the authors :)

Data & Analytics

Where should saliency models look next ? (UPC Reading Group)