Dynamic Data Modeling, Recognition and Synthesiscvrl/zhaor/files/defense_slides.pdf · / 51 Thank You! 51 Related Publications: • Rui Zhao, Quan Gan, Shangfei Wang and Qiang Ji,

Dynamic Data Modeling, Recognition, and SynthesisRui ZhaoThesis DefenseAdvisor: Professor Qiang Ji

/ 51

Contents▪ Introduction

▪ Related Work

▪ Dynamic Data Modeling & Analysis• Temporal localization

• Insufficient annotations

• Large intra-class variations

• Complex dynamics

▪ Summary

2

/ 51

Overview

3

Speech Processing

Computer Vision

Neurology

Natural Language Processing

Dynamic Data

/ 51

Major Tasks▪ Analyze Dynamic Data

▪ Modeling • Provide a mathematical description of the dynamic process

▪ Analysis• Prediction: Forecast future values of dynamic data

• Regression: Estimate a target value given dynamic data

• Classification: Divide dynamic data into different categories

• Synthesis: Generate new dynamic data

4

Modeling AnalysisCollection AnnotationPre-

processing

/ 51

Challenges▪ Insufficient annotation

▪ Uncertainty in data and model

▪ Intra-class variation

▪ Complex dynamics

5

/ 51

Related Work▪ A taxonomy of modeling techniques

6

Modeling dynamic data

Probabilistic approaches

Deterministic approaches

Directed PGM

Undirected PGM

Gaussian Process

Neural Networks

Dynamic Feature

LDS

HMM

DBN

DRF

RBM

RNN

LSTM

/ 51

Part 1: Dynamic Pattern Localization▪ Problem: Temporally localize the dynamic pattern in a time series by determining its starting and ending time

▪ Applications: • Brain computer interface (BCI)

• Speech recognition

• Event recognition

▪ Our Solution: • Combine dynamic model (HMM) with robust estimation (RANSAC)

7

Irrelevant segment Irrelevant segment

Inlier subsequence

/ 51

Methods▪ Overview

8

Set of CompleteSequences

HMM𝑠

Set of Sub-sequences

HMM∗

𝑠𝑡ℎ subset

𝑝: Probability of selecting all inliers𝜖: Proportion of outliers𝐾: Total number complete sequences

/ 51

Methods▪ Step 1: Divide complete sequences into overlapping subsequences

9

⋯ ⋯ ⋯

⋯

⋯

⋯

⋯

𝐿

𝑀

𝐿 −𝑀 + 1

/ 51

Methods▪ Step 2: Randomly select a subsequence from each complete sequence to train HMM

10

HMM𝐾

Learning: EM algorithm

/ 51

Methods▪ Step 3: Vote for the learned HMM using remaining subsequences

11

HMM 𝜃𝐾

Votes:

/ 51

Methods▪ Step 1: Divide complete sequences into overlapping subsequences

▪ Step 2: Randomly select a subsequence from each complete sequence to train HMM

▪ Step 3: Vote for the learned HMM using remaining subsequences

▪ Step 4: Go back to Step 2 and repeat for enough times

▪ Step 5: Find the HMM with the highest vote and use it to identify the inlier subsequences

▪ Step 6: Retrain the HMM using inlier subsequences

▪ Step 7: Identify inlier subsequences and merge to get signals of interest

12

/ 51

Experiments▪ Electrocorticographic (ECoG) data• Data collection

• Preprocessing & Feature Extraction: Spectrum filter (70-170Hz) + Hilbert transform

Experiment protocol

13

Wang et al. IJCAI2013

Stage 1 State 2 Stage 3

/ 51

Experiments▪ Data Statistics• 4 subjects

• Each has 10 trials, each of length 800

• Average hit time

▪ Classification Results:

14

/ 51

Part 2: Dynamic Regression under Insufficient Annotation▪ Problem: Given sequential data x1, … x𝑇, we want to compute a regression function 𝑦𝑡 = 𝑓(x𝑡), for some target value 𝑦𝑡

▪ Challenge: • Only part of the sequential data are annotated

▪ Applications: • Facial expression intensity estimation

• Sensor fault detection

• Part-of-speech tagging

▪ Our solution: Incorporate temporal information as additional constraints

15

/ 51

Problem Statement▪ Goal: Given input set with partial labels,

find a regression function from 𝐱 to 𝑦

▪ Example:𝑥1, 𝑦1 𝑥2 𝑥3 𝑥4 (𝑥5, 𝑦5)

𝐕 = 1,5

16

onset

Peak

offset

𝑦 ↑ 𝑦 ↓

𝐄 =1,2 , 1,3 , 1,4 , 1,5 , (2,3)

2,4 , 2,5 , 3,4 , 3,5 , (4,5)

/ 51

Ordinal Support Vector Regression (OSVR)▪ Regression model with parameters

▪ Dataset with weak labels:

▪ Optimization problem

▪ Optimization method: alternating direction method of multipliers (ADMM)

Regression Loss Ordinal LossRegularizationObjective = + +

17

/ 51

Experiments: Facial Expression Intensity Estimation▪ Overview

▪ Dataset: PAIN

▪ Feature: Gabor features, landmark, LBP + PCA

▪ Evaluation Criteria:• Pearson correlation coefficient (PCC)• Intra-class correlation (ICC)• Mean absolute error (MAE)

18

/ 51

Experiments▪ PAIN dataset• Experiments under different annotation settings

• Partial labels: about 8.8% of the total number of frames

• Use of ordinal information is very helpful

19

/ 51

Experiments▪ PAIN dataset• Comparison with state-of-the-art

20

/ 51

Part 3: Classification and Synthesis under Large Intra-class Variation▪ Challenge:• Same underlying dynamic pattern can

manifest significant intra-class variation

▪ Applications: • Human action recognition

• Human motion synthesis

• Speech recognition

• Language translation

▪ Our Solution: • Hidden Semi-Markov Model

• Bayesian Hierarchical Modeling

21

/ 51

Methods▪ Hidden Markov Model (HMM)

22

𝑋2

𝑍2

𝑋3

𝑍3

𝑋1

𝑍1 𝑍𝑇

𝑋𝑇

𝐴

𝜓

𝜋

initial transition emission

/ 51

Methods▪ Hidden Semi-Markov Model (HSMM)

23

𝑋2

𝐷𝑇

𝑍2

𝐷1

𝑋3

𝑍3

𝑋1

𝑍1

𝐷3𝐷2

𝑍𝑇

𝑋𝑇

𝐴

𝐶

𝜓

𝜋

initial transition emissionduration

/ 51

Methods▪ Bayesian Hierarchical Dynamic Model (HDM)

𝛼

24

prior

𝑋2

𝐷𝑇

𝑍2

𝐷1

𝑋3

𝑍3

𝑋1

𝑍1

𝐷3𝐷2

𝑍𝑇

𝑋𝑇

𝐴

𝐶

𝜓

𝜉

𝜂

𝜂0

𝜆

𝜋

likelihood

𝜃

/ 51

Methods▪ Learning: estimating hyperparameter 𝛼• Overall objective

• Approximate objective

• An alternating strategy

Intractable

▪ Optimization methods:

MAP-EM

ML

25

/ 51

Methods▪ Bayesian Inference: predictive likelihood

• Advantage: reduce overfitting and improve generalization

• Posterior inference:

• Method: Gibbs Sampling

• Classification:

26

/ 51

Experiments: Action Recognition▪ Overview:

▪ Dataset: MSR-Action3D, UTD-MHAD, G3D, Penn

▪ Feature: Joint position and motion in 3D or 2D

27

/ 51

Experiments: Action Recognition▪ Individual dataset: comparison with different baselines

28

/ 51

Experiments: Action Recognition▪ Individual dataset: comparison with different state-of-the-art

29

/ 51

Experiments: Action Recognition▪ Cross dataset classification accuracy• A: MSR

• B: UTD

• C: G3D

30

/ 51

Methods: Adversarial Learning▪ Adversarial Learning: a better criterion for data synthesis

• Overall objective

31

/ 51

Methods: Adversarial Inference▪ Bayesian adversarial inference• Sample both generator 𝜃 and discriminator 𝜙

32

𝐗+ 𝜙 𝐗−

𝐻+

𝜃

𝐻−

Discriminator Generator

𝐙,𝐃𝛼𝑑 𝛼𝑔

/ 51

Methods: Adversarial Inference▪ Bayesian adversarial inference• Sample generator 𝜃 conditioned on 𝜙

33

𝜙

𝐙,𝐃

𝐗− 𝜃

𝛼𝑔

𝐻−

Generator

likelihood prior

▪ Inference method: Stochastic Gradient Hamiltonian Monte Carlo (SGHMC)

/ 51

Methods: Adversarial Inference▪ Bayesian adversarial inference• Sample discriminator 𝜙 conditioned on 𝜃

34

𝐗+ 𝜙

𝐻+ 𝐻−

Discriminator

𝛼𝑑

𝐗−

Like-lihood

prior

▪ Inference method: Stochastic Gradient Hamiltonian Monte Carlo (SGHMC)

/ 51

Methods▪ Bayesian adversarial inference: data synthesis• Overall synthesis target:

• Steps:

35

Posterior sampling

𝐗− 𝜃

𝑀

𝐙,𝐃 𝛼𝑔

𝒟+

Generate new data using all the {𝜃𝑚}

/ 51

Experiments: Motion Synthesis▪ Dataset• CMU Motion capture• Berkeley Motion capture

▪ Feature: Joint angles

▪ Qualitative Results

36

Real data Synthetic data

/ 51

Experiments: Motion Synthesis▪ Quantitative Results• Metric: BLEU score: fidelity

CMU Berkeley

37

/ 51

Experiments: Motion Synthesis▪ Quantitative Results

Inception score: diversity MMD: distribution-level similarity

38

/ 51

Part 4: Modeling Complex Dynamics▪ Challenge:• Structural dependency

• Long-term temporal dependency

•Uncertainty and large variations

▪ Application: ▪ Action recognition

▪ Our solution: Bayesian Graph Convolution LSTM

39

/ 51

Methods▪ Overall framework:

40

Bayesian GC-LSTM

LSTMGCU

Classifier

Discriminator

Δ𝑡

Testing

data

Training

data

/ 51

Methods▪ GC-LSTM• Overall architecture

41

/ 51

Methods▪ GC-LSTM• Graph convolution: generalize convolution to arbitrary graph

structured data

42

𝐹𝑘

𝑁 × 𝑁

𝐻(𝑙)

𝑁 × 𝐷𝑙

𝑊𝑘(𝑙)

𝐷𝑙 × 𝐷𝑙+1 +𝑏𝑘(𝑙)

××

Graph kernel Data Kernel parameters

𝑁: number of nodes𝐷: dimension of each node

/ 51

Methods▪ GC-LSTM• Long short-term memory (LSTM) network:

modeling long-term temporal dynamics

43

𝑦2

𝑋𝑇

𝑍2

𝑋1

𝑦1

𝑍1

𝑋2

𝑍𝑇

𝑦𝑇

Input

Hiddenstate

Output

/ 51

Methods▪ Bayesian GC-LSTM• Extend GC-LSTM to a probabilistic model: 𝜃~𝑃(𝜃|𝛼𝜃)

• Infer the posterior distribution of parameters

44

Bayesian GC-LSTM

LSTMGCU

Classifier

Δ𝑡

Training

data 𝐗+

likelihood prior

/ 51

Methods▪ Bayesian GC-LSTM• Adversarial Prior: use additional discriminator to regularize parameters

• Intuition: promote a feature representation to be invariant of subject

45

Bayesian GC-LSTM

LSTMGCU

Classifier

Discriminator

Δ𝑡

Testing

data

Training

data 𝐗+

𝐗−

/ 51

Methods▪ Bayesian Inference

• Classification:

46

/ 51

Experiments▪ Ablation study

47

R: random rotationN: random noise

Effect of graph convolution

Effect of Bayesian inference

/ 51

Experiments▪ Comparison with state-of-the-art

48

MSR Action3D SYSU

UTD MHAD

/ 51

Experiments▪ Generalization across different datasets

49

/ 51

Thesis Summary▪ Localization of dynamic pattern• Propose a method that combines robust estimation and dynamic model

for localization

▪ Dynamic pattern regression under insufficient annotation• Incorporate ordinal information as addition constraints for model learning• Develop an optimization algorithm for parameter estimation

▪ Dynamic pattern classification and synthesis under large intra-class variation• Propose a Bayesian hierarchical model• Develop two Bayesian inference algorithms

▪ Modeling complex dynamics• Propose a Bayesian neural network model• Develop a Bayesian inference algorithm

50

/ 51

Thank You!

51

▪ Related Publications:• Rui Zhao, Quan Gan, Shangfei Wang and Qiang Ji, Facial Expression Intensity Estimation

Using Ordinal Information, CVPR 2016.• Rui Zhao, Md Ridwan Al Iqbal, Kristin Bennett and Qiang Ji, Wind Turbine Fault Prediction

Using Soft Label SVM, ICPR 2016.• Rui Zhao, Gerwin Schalk, Qiang Ji, Robust Signal Identification for Dynamic Pattern

Classification, ICPR 2016.• Rui Zhao, Qiang Ji, An Adversarial Hierarchical Hidden Markov Model for Human Pose

Modeling and Generation, AAAI 2018.• Rui Zhao, Gerwin Schalk, Qiang Ji, Temporal Pattern Localization using Mixed Integer

Linear Programming, ICPR 2018.• Rui Zhao, Qiang Ji, An Empirical Evaluation of Bayesian Inference Methods for Bayesian

Neural Networks, NIPS Workshop 2018 (To appear)• Rui Zhao, Wanru Xu, Hui Su, Qiang Ji, Bayesian Hierarchical Dynamic Model for Human

Action Recognition (Under review)• Rui Zhao, Hui Su, Qiang Ji, Bayesian Adversarial Human Motion Synthesis (Under review)• Rui Zhao, Kang Wang, Hui Su, Qiang Ji, Bayesian Graph Convolution LSTM for Skeleton

based Action Recognition (Under review)

Documents

Dynamic Data Modeling, Recognition and Synthesiscvrl/zhaor/files/defense_slides.pdf · / 51 Thank You! 51 Related Publications: • Rui Zhao, Quan Gan, Shangfei Wang and Qiang Ji,