CAP 6412 Advanced Computer Vision - CS Departmentbgong/CAP6412/lec3.pdfCAP 6412 Advanced Computer...

Preview:

Citation preview

CAP6412AdvancedComputerVision

http://www.cs.ucf.edu/~bgong/CAP6412.html

Boqing GongJan19,2016

Today

• Administrivia• Neuralnetworks&backpropagation(PartII)• DeepresiduallearningbyDustin

Assignment1dueat3pm,01/21(Thursday)

• Reviewthefollowingpaper

[Visualization] Zeiler,MatthewD.,andRobFergus.“Visualizingandunderstandingconvolutionalnetworks.”InComputerVision–ECCV2014,pp.818-833.SpringerInternationalPublishing,2014.

Templateforpaperreview:http://www.cs.ucf.edu/~bgong/CAP6412/Review.docx

Use“latehomeworkpolicy”wisely

- Threelatedaysintotalforallreportsandprojects- Countingatthegranularityof12hours- Noadditionallatedays

• Somearelatefortheone-pointassignment“TopicPreferenceList”• Tolose1point?(Default)• OR,toearn1pointandtotriggerthelatehomeworkpolicy?(Sendmeemail)

Email--- thebestwaytoreachme

• bgong@crcv.ucf.edu (preferred)• DONOTleavemessagesundermyannouncements

• Put[CAP6412] insubjectline• Summarizemessageinsubjectline• Ex:[CAP6412]Meetingrequest:Thursday(Jan14)4:30pm?

Officehoursofthisweek

• Tuesday:4:30—5:30pmà Thursday:4:30—5:30pm• HEC214

Thisweek:CNNvisualizatin &objectrecognition

Tuesday(01/19)

DustinMorley

[ILSVRC] Russakovsky, Olga, Jia Deng, Hao Su, Jonathan Krause,Sanjeev Satheesh, Sean Ma, Zhiheng Huang et al. “Imagenet largescale visual recognition challenge.” International Journal of ComputerVision (2014): 1-42.[152 layers] He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun.“Deep Residual Learning for Image Recognition.” arXiv preprintarXiv:1512.03385 (2015).

Thursday(01/21)

Jason Tiller

[Visualization] Zeiler, Matthew D., and Rob Fergus. “Visualizing andunderstanding convolutional networks.” In Computer Vision–ECCV2014, pp. 818-833. Springer International Publishing, 2014.Zhou, Bolei, Aditya Khosla, Agata Lapedriza, Aude Oliva, and AntonioTorralba. “Object detectors emerge in deep scene cnns.” arXivpreprint arXiv:1412.6856 (2014).

Nextweek:CNN&objectlocalizationTuesday(01/26)

SamerIskander

J. Hosang, R. Benenson, and B. Schiele. How good are detectionproposals, really? BMVC 2014.{Major} J. Hosang, R. Benenson, P. Dollár, and B. Schiele.What makesfor effective detection proposals?PAMI 2015.{Major} [Faster R-CNN] Ren, Shaoqing, Kaiming He, Ross Girshick,and Jian Sun. “Faster R-CNN: Towards real-time object detection withregion proposal networks.” In Advances in Neural InformationProcessing Systems, pp. 91-99. 2015.

Thursday(01/28)

Syed Ahmed

{Major}[R-CNN] Girshick,Ross,JeffDonahue,TrevorDarrell,andJagannathMalik."Richfeaturehierarchiesforaccurateobjectdetectionandsemanticsegmentation."InComputerVisionandPatternRecognition(CVPR),2014IEEEConferenceon,pp.580-587.IEEE,2014.[FastR-CNN] Girshick,Ross."FastR-CNN."arXiv preprintarXiv:1504.08083 (2015).

LinkhasbeensenttoyourUCFemails

Today

• Administrivia• Neuralnetworks&backpropagation(PartII)• FundamentalsofConvolutionalNeuralNetworks(CNN),byFareeha

Review:biologicalneurons

• Humanbrainshasabout10billionnuerons• Eachconnectedto10Kotherneurons• Aneuronfiresifthesumofelectrochemicalinputsexceedssomethreshold

Imagecredit:cs.stanford.edu/people/eroberts

Review:biologicalneurons

• Humanbrainshasabout10billionnuerons• Eachconnectedto10Kotherneurons• Aneuronfiresifthesumofelectrochemicalinputsexceedssomethreshold

Imagecredit:cs.stanford.edu/people/eroberts

Review:artificialneurons/perceptrons

• Aneuronfiresifthesumofweightedinputsexceedssomethreshold

Imagecredit:www.hiit.fi/u/ahonkela/dippa/node41.html

y = '(nX

i=1

wixi + b)

= '(wTx+ b)

'(·) : activation function

Constructingneuralnetworksfromneurons

• Humanbrainshasabout10billionnuerons• Eachconnectedto10Kotherneurons• Aneuronfiresifthesumofelectrochemicalinputsexceedssomethreshold

Imagecredit:cs.stanford.edu/people/eroberts

Basicnetworkstructures

• Feed-forwardnetworks • Recurrentneuralnetworks

Imagecredit:http://mesin-belajar.blogspot.com/2016/01/a-brief-history-of-neural-nets-and-deep_84.html

Imposingdesiredproperties

• Totuneittowardsdesiredproperties

• E.g.,forbinaryclassification• Outputbetween0and1• Tellstheprobabilityoftheinputxbelongingtoeitherclass+1/-1

Imagecredit:Farid E Ahmed

Acasestudy

• Binaryclassification• Outputbetween0and1• Tellstheprobabilityoftheinputxbelongingtoeitherclass+1/-1

• Step1:choosenetworkstructure• Step2:chooseactivationfunction• Step3:determinethemodelparameters𝚯,

tomeetdesiredproperties

Imagecredit:Farid E Ahmed

-10 -5 0 5 10-1

-0.5

0

0.5

1Binary step

-10 -5 0 5 10-1

-0.5

0

0.5

1Logistic

-10 -5 0 5 10-1

-0.5

0

0.5

1TanH

-10 -5 0 5 100

2

4

6

8

10Rectified Linear Unit (ReLU)

'(x) =

1

1 + exp(�x)

Learningthemodelparameters𝚯 (1)

• Isequivalentto

• where,

• Questions:

Binary classification concept: c : X 7! Y = {0, 1}Hypotheses H = {net(⇥)|⇥d 2 R}

Choose one hypothesis h 2 H to approximate concept c

c is unknown

c 2 H?

EmpiricalRiskMinimization(ERM)

Learningthemodelparameters𝚯 (2)

• Isequivalentto

• Canbeimplementedby

Choose one hypothesis h 2 H to approximate concept c

? argmin

⇥R(⇥)

R(⇥) = Pr(net(x;⇥) 6= y) = E(x,y)⇠PXY[net(x;⇥) 6= y]

P

XY

is the underlying distribution of (x,y)

ß Calledthegeneralizationrisk

Nextclass

⇥? argmin⇥

E(x,y)⇠PXY[net(x;⇥) 6= y]

Today

• Administrivia• Neuralnetworks&backpropagation (PartI)• DeepresiduallearningbyDustin

Deep Residual Learning for Image RecognitionAuthors: Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun (Microsoft Research)

Presented by Dustin Morley

About the paper

´ NOT peer-reviewed – published on arXiv (Dec. 2015)

´ Well supported claim: “We provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth.”

´ Questionable claim: “…residual nets with a depth of up to 152 layers – 8x deeper than VGG nets but still having lower complexity”´ Claim of lower complexity is not convincingly supported

´ My rating: 2´ Great innovation with high significance, but claims and experimental data

presentation are not organized that well.

Main Contributions

´ Proposed a novel approach to resolve the issue of performance degradation with increased depth

´ Obtained excellent object recognition and localization results´ Ensemble network on ImageNet dataset – 3.57% top-5 classification error

´ 101-layer ResNet on COCO validation set (object detection) – 27.2% mAP@[0.5, 0.95]

´ Won 1st place in several tracks in ILSVRC and COCO 2015 competitions´ ImageNet detection

´ ImageNet localization

´ COCO detection

´ COCO segmentation

Outline

´ Background – theoretical and experimental

´ Problem – NN scalability with added layers

´ Solution – Residual Learning via Identity Mapping “Shortcuts”

´ Experimental Results

´ Conclusion, evaluation, and future directions

Background

´ Convolutional Neural Networks´ Layers – Conv., pool,

Conv., pool, Conv., pool…

´ Conv./pool results propogated forward

´ Classification error propogated backward´ Each layer computes

error derivatives WRT its parameters

Image Credit: Oxford Visual Geometry Group

Background – ImageNet 2012

´ Dataset for image classification

´ 1000 classes

´ 1.28 million training images

´ 50k validation images

´ 100k test images (final results)

´ Top-1 and top-5 error rates

Background – CIFAR-10 Testing

´ Dataset for image classification

´ Images are small (32x32, color)

´ 10 classes

´ 50k training images

´ 10k test images (final result)

Background – MS COCO Testing

´ Dataset for object detection

´ 80 Object Categories

´ 80k training images

´ 40k test images

´ Detailed manual segmentations of images

´ Evaluation metrics revolve around mean average precision (mAP) and intersection over union (IoU)´ Partition results into different classes of IoU ([0.5,0.55], [0.55, 0.6], … [0.95, 1]

´ Compute average precision for each class

´ Compute mean of the average precisions over all classes

Background – PASCAL VOC Testing

´ Dataset for object detection

´ 16k training images from VOC 2012

´ First test set – 5k test images from VOC 2007

´ Second test set – 10k test images from VOC 2007

´ Evaluation metric – similar to MS COCO (but not exactly the same)

Problem – about adding layers…

´ Theory – only overfitting

´ Practice – multiple issues´ Convergence (mostly solved by normalization layers)

´ Accuracy degradation (Training accuracy degrades!!!)

Too many layers?

´ Theory – more layers should never harm training performance´ Take solution for m layers. Add more layers configured such that they only

perform identity operation – same performance.

´ Thus, equivalent or better solution always exists when more layers are added

´ Implication – optimization methods cannot handle too many layers

´ Need a reformulation of extra layers that makes optimization easier

Solution: Residual Network

´ Conjecture: difficult for optimization to deduce “unneeded layers” ´ Equivalently: difficult to determine a

layer should be an identity mapping

´ Recast initial condition so that result under identity mapping is visible

´ Use “shortcuts” to go “around” layers in addition to going “through” them

´ Mathematically: minimize F(x)+x instead of just minimizing F(x)

Residual Network Comparison

Implementation Details - ImageNet

´ Image resized with 1 randomized dimension for scale augmentation

´ Fixed size crop randomly sampled from an image or its horizontal flip

´ Per-pixel mean subtracted

´ Color augmentation´ According to: A. Krizhevsky et al, Imagenet classification with deep convolutional

neural networks, NIPS, 2012

´ Batch normalization right after each convolution, before activation

´ Learning rate divided by 10 when error plateaus

ImageNet Results – Direct Comparison

´ ImageNet

´ “Plain” network top-1 error: 27.94% for 18 layers, 28.54% for 34 layers

´ Residual network top-1 error: 27.88% for 18 layers, 25.03% for 34 layers

ImageNet Results – High Scalability

Configuration differences for A, B, and C are regarding how the “shortcuts” handle changes in dimensionality (A = zero padding, B = projection applied for increasing dimensions only, C = projection always applied)

ImageNet Results - Ensemble

´ ResNet ensemble built from 6 models of different depth (only 2 of the models are depth 152)

CIFAR-10 Results

Object Detection Results

´ PASCAL VOC ´ MS COCO

Conclusion

´ Blindly increasing depth of CNNs can lead (counterintuitively) to decreases in performance rather than increases

´ The residual “shortcut” approach allows benefit to be gained from increasing depth of CNNs

´ Networks built by the authors on this principle performed very well, winning 1st place in several competitions

Evaluation

Strengths´ Novel idea

´ Solves interesting and important problem

´ Approach should be able to be “dropped in” to virtually any CNN design

´ Authors obtained very good performance

Weaknesses´ Questionable statements about

complexity

´ Some parts in the presentation of results are confusing

´ Certain direct comparisons of results didn’t seem particularly meaningful

Future Directions

´ Are there other types of shortcuts in addition to the identity mapping shortcut that could further improve performance?´ Could these be inferred by studying the nonlinear mappings output by successful

small-depth networks?

´ Insert the identity mapping shortcuts into other neural networks.´ Results section pinned ResNet “against” vgg and other networks. This seems like a

tyranny of either/or.

Recommended