Upload
lynsey
View
44
Download
0
Embed Size (px)
DESCRIPTION
Tracking with Local Spatio -Temporal Motion Patterns in Extremely Crowded Scenes. Present by 陳群元. outline. Introduction Previous work Predicting motion patterns Spatio -temporal transition distribution Discerning pedestrians Experimental results conclusion. introduction. - PowerPoint PPT Presentation
Citation preview
Tracking with Local Spatio-Temporal Motion Patterns
in Extremely Crowded Scenes
Present by 陳群元
outline Introduction Previous work Predicting motion patterns Spatio-temporal transition distribution Discerning pedestrians Experimental results conclusion
introduction Tracking individuals in extremely crowded scenes
is a challenging task, we predict the local spatio-temporal motion
patterns that describe the pedestrian movement at each space-time location in the video.
we robustly model the individual’s unique motion and appearance to discern them from surrounding pedestrians.
Previous work Previous work track features and associate similar
trajectories to detect individual moving entities within crowded scenes.
We encode many possible motions in the HMM, and derive a full distribution of the motion at each spatio-temporal location in the video.
outline Introduction Previous work Predicting motion patterns Spatio-temporal transition distribution Discerning pedestrians Experimental results conclusion
Markov Model An example : a 3-state Markov Chain λ
o State 1 generates symbol A only, State 2 generates symbol B only, and State 3 generates symbol C only
o Given a sequence of observed symbols O={CABBCABC}, the only one corresponding state sequence is {S3S1S2S2S3S1S2S3}, and the corresponding probability isP(O|λ)=P(q0=S3) P(S1|S3)P(S2|S1)P(S2|S2)P(S3|S2)P(S1|S3)P(S2|S1)P(S3|S2) =0.10.30.30.70.20.30.30.2=0.00002268
T2.05.03.0
5.03.02.02.07.01.01.03.06.0
π
A 1.05.04.0
5.02.03.02.07.01.01.03.06.0
A s2 s3
A
B C
0.6
0.7
0.30.3
0.20.2
0.10.3
0.7
s1
Hidden Markov Model
An example : a 3-state discrete HMM λ
o Given a sequence of observations O={ABC}, there are 27 possible corresponding state sequences, and therefore the corresponding probability is
s2
s1
s3
{A:.3,B:.2,C:.5}
{A:.7,B:.1,C:.2}{A:.3,B:.6,C:.1}
0.6
0.7
0.30.3
0.20.2
0.10.3
0.7
1.05.04.01.0,6.0,3.02.0,1.0,7.0
5.0,2.0,3.0
5.02.03.02.07.01.01.03.06.0
333
222
111
CBACBA
CBA
A
bbbbbbbbb
07.02.0*7.0*5.023 22 20
007.01.0*1.0*7.03 2 2 , , 322 when ..
sequence state: ,27
1,
27
1,
SSPSSPSqPiP
SPSPSPiPSSSige
ii iPiPi iPP
q
CBAqOq
qqqOqOO
Predicting Motion Patterns
Spatio-temporal gradient
f(Pos) = (f(Pos+1) -f(Pos) + f(Pos) -f(Pos-1))/2 = f(Pos+1)-f(Pos-1)/2;
For each pixel i in cuboid I is intensity
spatio-temporal motion pattern
the local spatio-temporal motion pattern represented by a 3D Gaussian of spatio-temporal gradients
Training HMM The hidden states of the HMM are represented by
a set of motion patterns
The probability of an observed motion pattern given a hidden state s is
Kullback–Leibler divergence
Kullback–Leibler divergence is a non-symmetric measure of the difference between two probability distributions P and Q.
predictive distribution After training a collection of HMMs on a video of
typical crowd motion, we predict the motion pattern at each space-time location that contains the tracked subject.
where S is the set of hidden states, w(s) is defined by
Vector of scaled message
Reference :A Tutorial On Hidden Markov Models andSelected Applications in Speech Recognition.
predicted localspatio-temporal motion pattern
a weighted sum of the 3D Gaussian distributions associated with the HMM’s hidden states
The centroid we are interested in is a multivariate normal density that minimizes the total distortions. Formally, a centroid c is defined as,
Reference: On Divergence Based Clustering of Normal Distributions and Its Application to HMM Adaptation
Predicted motion pattern
where and are the mean and covariance of the hidden state s, respectively.
outline Introduction Previous work Predicting motion patterns Spatio-temporal transition distribution Discerning pedestrians Experimental results conclusion
Bayesian probabilities
we use the gradient information to estimate the optical flow within each specific sub-volume and track the target in a Bayesian framework.
Bayesian tracking can be formulated as maximizing the posterior distribution of the state xt of the target at time t given available measurements z1:t = {zi; i = 1 : : : t} by
zt is the image at time t, p (xt|xt-1) is the transition distribution, and p (zt|xt) is the likelihood.
state vector x t as the width, height, and 2D location of the target within the image.
we focus on the target’s movement between frames and use a 2nd-degree autoregressive model for the transition distribution of the target’s width and height.
Ideally, the state transition distribution p (xt|xt-1) directly reflects the two-dimensional motion of the target between frames t -1 and t.
where is the 2D optical flow vector, and is the covariance matrix.
optical flow Assuming the movement to be small, the image
constraint at I(x,y,t) with Taylor series can be developed to get
H.O.T
The predicted motion pattern is defined by a mean gradient vector and a covariance matrix
The motion information encoded in the spatio-temporal gradients can be expressed in the form of the structure tensor matrix
The optical flow can then be estimated from the structure tensor by solving
where w = [u; v; z]T is the 3D optical flow
Covariance matrix
outline Introduction Previous work Predicting motion patterns Spatio-temporal transition distribution Discerning pedestrians Experimental results conclusion
Typical models of the likelihood distribution p (z t |x t )
where is the variance, is a distance measure, and Z is a normalization term.
difference between a region R (defined by state x t ) of the observed image z t and the template.
We assume pedestrians exhibit consistency in their appearance and their motion, and model them in a joint likelihood by
where pA and pM are the appearance and motion likelihoods
Update motion template
After tracking in frame t, we update each pixel i in the motion template by
where is the motion template at time t, Is the region of spatio-temporal
gradient defined by the tracking result (i.e., the expected value of the posterior)
is the learning rate.
update this error measurement
The error at pixel i and time t becomes
ti and ri are the normalized gradient vectors of the motion template and the tracking result at time t
To reduce the contributions of frequently changing pixels to the computation of the motion likelihood, we weigh each pixel in the likelihood’s distance measure.
where Z is a normalization term such that
distance measure The distance measure of the motion likelihood
distribution becomes
outline Introduction Previous work Predicting motion patterns Spatio-temporal transition distribution Discerning pedestrians Experimental results conclusion
The training video for the concourse scene contains 300 frames (about 10 seconds of video),
the video for ticket gate scene contains 350 frames.
We set the cuboid size to 10*10*10 for both scenes.
The learning rate , appearance variance , and motion variance are 0.05.
outline Introduction Previous work Predicting motion patterns Spatio-temporal transition distribution Discerning pedestrians Experimental results conclusion
Conclusion In this paper, we derived a novel probabilistic
method that exploits the inherent spatially and temporally varying structured pattern of a crowd’s motion to track individuals in extremely crowded scenes.
The end Thank you