Deep Learning for Computer Vision Pr. Jenny Benois-Pineau...

Deep Learning for Computer Vision Pr. Jenny Benois-Pineau LABRI UMR 5800/Université Bordeaux Chapter 6. Temporal aspects. Applications

Chapter 6

Summary. 1.  Temporal aspects RNN,LSTM 2.  Applications. 3D Conv.nets…

Video Analysis & Coding/Computer Vision 2

1. RNN

➔  Reccurent neural networks (RNNs) are a family of neural networks for processing sequential data.

➔  Formally, it is a neural network which is specialized for processing a sequence of values

➔  Advantage : sharing parameters across different parts of the model ( applied to the different time observations)

➔  We consider a RNN operating on a sequence of vectors

➔  In practice, RNN usually operate on minibatches of such sequences.

x 1( ) ,x 2( ) ,....,x τ( )

Rumelhart, D.E., McCelland, J.L., and the PDP Research Group (1986) Parallel Distibuted Processing: Explorations in the Microstructure of Cognition, MIT Press, Cambridge

x 1( ) ,x 2( ) ,....,x τ( )

The idea of computational graph unfolding

➔  Consider a classical form of dynamical system (in CV ex. Dynamic model of a moving object in a video sequence ( e.g. with a constant velocity)

➔  is called the state of the system ➔  The equation is recurrent

s t( ) = f s t−1( );θ( )s t( )

Unfolding

➔  For a finite number of time steps

➔  Unfolding the equation by repeatedly applying the definition in this way has yielded an expression that does not involve recurrence.

s 3( ) = f s 2( );θ( ) = f f s 1( );θ( );θ( )

s ...( ) s t−1( ) s t( ) s t+1( )s ...( )

f f f f

RNN as

➔  The equation using external signals (h is a state)

h t( ) = f h t−1( ) ,x t( );θ( )

It is possible to use the same transition

h t−1( ) h t( )h t+1( )

x t−1( )x t( ) x t+1( )

f f ffh ...( ) h ...( )

And finally

➔  Unfolded recurrent network with Loss

h t−1( ) h t( )h t+1( )

x t−1( )x t( ) x t+1( )

W W WWh ...( ) h ...( )

Lt−1( ) Lt( ) Lt+1( )

o t−1( )o t( ) o t+1( )

y t−1( ) y t( ) y t+1( )

Equations of RNN

➔  Forward propagation

➔  Parameter estimation : backpropagation and gradient descent

➔  Difficult to train

a t( ) = b+Wh t−1( ) +Ux t( )

h t( ) = f a t( )( )o t( ) = c+Vh t( )

LSTM – Long-Short Term Memory

➔  Gated RNNs ➔  The idea : creating paths through time that have derivatives that neither

vanish, nor explode ➔  Connection weights may change at each time

➔  For video analysis LSTM have been mainly replaced by 3D convolutional neural networks

Hochreiter and Schmidhubner, 1997

Improve athletes performances

for teachers and athletes

through tools

CBMI - September 6th, 2018 Sport Action Recognition with Siamese Spatio-Temporal CNNs: Application to Table Tennis

11 CBMI - September 6th, 2018 Sport Action Recognition with Siamese Spatio-Temporal CNNs: Application to Table Tennis

Offensive Forehand Loop

Input Output

-  Extract strokes in the temporal dimension

-  Classify the strokes

1 - A new dataset : TTStroke-21

129 videos at 120 fps 1 387 / 1 074 annotations before / after filtering for 20 classes 1 048 strokes + 272 negative samples extracted

Acquisition

Annotation platform Samples

TTStroke-21

[1] H. Bilen, B. Fernando, E. Gavves, and A. Vedaldi, “Action recognition with dynamic image networks,” CoRR, vol. abs/1612.00738, 2016. [2] J. Carreira and A. Zisserman, “Quo vadis, action recognition? A new model and the kinetics dataset,” CoRR, vol. abs/1705.07750, 2017. [3] G. Varol, I. Laptev, and C. Schmid, “Long-term temporal convolutions for action recognition,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 40, no. 6, pp. 1510–1517, 2018.

Use of Dynamic Images[1] Very deep 3D

CNN[2]

Long-term Temporal Convolutions[3]

2 - Related Work

3 - Proposed method

Goal : good classification of the strokes extracted

-  Use of deep learning model

-  Need of temporal and spatial segmentation

-  Data augmentation Best accuracy : 91.4% against 43.1% for the state of the art method[2]

[2] J. Carreira and A. Zisserman, “Quo vadis, action recognition? A new model and the kinetics dataset,” CoRR, vol. abs/1705.07750, 2017.

Offensive Forehand Loop

3.a - Model Architecture

Siamese Spatio-Temporal Convolutional Neural Network Input

(W,H,T) = (100,120,120)

Training :

Stochastic Gradient descent Cross-entropy loss : = -x[class] + log(\sum_j exp(x[j])) learning rate = 0.001 for Siamese and 0.01 for one branch Nesterov Momentum Epochs 2000 Momentum : 0.5 decreased to 0.1 and 0.05 at epoch 1000 and 1500 Datasets : Training 70%,Validation 20%, Test : 10%

* “IMAGE SUPER RESOLUTION KERAS” from impremedia.net

3D convolutions*

3.b - Input Data

16 16 CBMI - September 6th, 2018 Sport Action Recognition with Siamese Spatio-Temporal CNNs: Application to Table Tennis

[4] C. Liu, “Beyond pixels: Exploring new representations and applications for motion analysis,” Ph.D. dissertation, Massachusetts Institute of Technology, 5 2009. [5] Z. Zivkovic and F. van der Heijden, “Efficient adaptive density estimation per image pixel for the task of background subtraction,” Pattern Recognition Letters, vol. 27, no. 7, pp. 773–780, 2006.

Original Frame

Motion estimation[4]

Foreground estimation[5] Foreground

Motion

17 17 17 CBMI - September 6th, 2018 Sport Action Recognition with Siamese Spatio-Temporal CNNs: Application to Table Tennis

3.b - Input Data

Spatial Segmentation using foreground motion

Xmax Xg

Final segmentation

Smoothing over temporal dimension using gaussian kernel of size 40 and standard deviation 4.44.

3.c - Data Augmentation

Online augmentation applied before spatial segmentation to avoid padding Spatial :

-  random rotation range ±10° -  random translation in range ±0.1 in x and y directions -  random homothety in range 1 ± 0.1

Temporal : -  100 successive frames with the 50th frame selected according to a normal

probabilistic distribution along the temporal dimension of the stroke extracted

4 - Results

Training of our SSTC model

4 - Results

21 CBMI - September 6th, 2018 Sport Action Recognition with Siamese Spatio-Temporal CNNs: Application to Table Tennis Training of the I3D model

4 - Results

Conclusion

➔  This course is very far from being complete ➔  It was an attempt to give fundamentals and some examples from

authors’s research ➔  Happy adventure with Deep Learning for your visual data and your

problems.

➔  Jenny Benois-Pineau

Deep Learning for Computer Vision Pr. Jenny Benois-Pineau...

Documents

Pierre-Olivier Pineau, HEC Montréal (Canada)

Pineau - Pro Qué Triunfó La Escuela

Dussel-Pineau-De cuando la

Deep Learning for Computer Vision Pr. Jenny Benois-Pineau ...benois-p/DeepLearning... · input data without labelled responses. Clustering is the most common unsupervised learning

Pineau Triunfo de La Esc

Pineau Pablo. Por Que Triunfo La Escuela

PINEAU- El Principio Del Fin

Samuel Pineau - Université de Nantes

Pineau - Por Que Triunfo La Escuela

2 Pineau Por Qué Triunfó La Escuela

PORTES OUVERTES - Pineau des Charentes · PORTES OUVERTES PINEAU DES CHARENTES Les 4, 5 et 6 août, avec les producteurs de Pineau des Charentes, échappez-vous au cœur du vignoble

Journée Scientifique - 7 juillet 2003 - Lyon Jenny Benois-Pineau Philippe Joly + SAMoVA

Joelle Pineau: Overview and objectivesjpineau/files/SOCSColloquium... · 2010. 9. 3. · 1 Joelle Pineau Joelle Pineau: Overview and objectives • Main scientific goal: Synthesis

TowardsReproducibilityin MachineLearning andAI...Kristian Kersting -TowardsReproducibilityin MachineLearning andAI ReproducibilityCrisisin ML & AI (2018) Joelle Pineau J. Pineau: „The

Pineau Mafalda (2)

Vidéo Numérique: Analyse et Codage Cours en Master ISM Jenny Benois -Pineau Université Bordeaux -1 Vidéo Numérique: Analyse et Codage Cours en Master ISM

SANCERRE “CUVÉE PRESTIGE” Domaine Raimbault-Pineau 2017internationalcellars.com/.../2018/09/Raimbault_Cuvee_Prestige_Sanc… · Domaine Raimbault Pineau is a small estate based

Cv Pablo Pineau

Jenny Benois-Pineau, LaBRI – Université de Bordeaux – CNRS UMR 5800/ University Bordeaux1 H. Boujut, V. Buso, L. Letoupin, R. Megret, S. Karaman, V. Dovgalecs,

JEDLIK LABORATORIES REPORTS FACULTY OF INFORMATI ON ... · [3] Iv án Gonz alez-D´ ´ az, Vincent Buso, Jenny Benois-Pineau, Guillaume Bourmaud, and R émi M égret, Modeling