Upload
others
View
6
Download
0
Embed Size (px)
Citation preview
ROAM: Recurrently Optimizing Tracking Model
Tianyu Yang1,2 Pengfei Xu3 Runbo Hu3 Hua Chai3 Antoni B. Chan2
1 Tencent AI Lab 2 City University of Hong Kong 3 Didi Chuxing
Abstract
In this paper, we design a tracking model consisting of
response generation and bounding box regression, where
the first component produces a heat map to indicate the
presence of the object at different positions and the second
part regresses the relative bounding box shifts to anchors
mounted on sliding-window locations. Thanks to the resiz-
able convolutional filters used in both components to adapt
to the shape changes of objects, our tracking model does
not need to enumerate different sized anchors, thus saving
model parameters. To effectively adapt the model to appear-
ance variations, we propose to offline train a recurrent neu-
ral optimizer to update tracking model in a meta-learning
setting, which can converge the model in a few gradient
steps. This improves the convergence speed of updating the
tracking model while achieving better performance. We ex-
tensively evaluate our trackers, ROAM and ROAM++, on
the OTB, VOT, LaSOT, GOT-10K and TrackingNet bench-
mark and our methods perform favorably against state-of-
the-art algorithms.
1. Introduction
Generic visual object tracking is the task of estimating
the bounding box of a target in a video sequence given only
its initial position. Typically, the preliminary model learned
from the first frame needs to be updated continuously to
adapt to the target’s appearance variations caused by rota-
tion, illumination, occlusion, deformation, etc. However,
it is challenging to optimize the initial learned model effi-
ciently and effectively as tracking proceeds. Training sam-
ples for model updating are usually collected based on es-
timated bounding boxes, which could be inaccurate. Those
small errors will accumulate over time, gradually resulting
in model degradation.
To avoid model updating, which may introduce unre-
liable training samples that ruin the model, several ap-
proaches [4, 43] investigate tracking by only comparing the
first frame with the subsequent frames, using a similarity
function based on a learned discriminant and invariant deep
Siamese feature embedding. However, training such a deep
representation is difficult due to drastic appearance varia-
tions that commonly emerge in long-term tracking. Other
methods either update the model via an exponential moving
average of templates [16, 44], which marginally improves
the performance, or optimize the model with hand-designed
SGD methods [33, 41], which needs numerous iterations
to converge thus preventing real-time speed. Limiting the
number of SGD iterations can allow near real-time speeds,
but at the expense of poor quality model updates due to the
loss function not being optimized sufficiently.
In recent years, much effort has been done on localizing
the object using robust online learned classifier, while few
attention is paid on designing accurate bounding box esti-
mation. Most trackers simply resort to multi-scale search
by assuming that the object aspect ratio does not change
during tracking, which is often violated in real world. Re-
cently, SiamRPN [23] borrows the idea of region proposal
networks [37] in object detection to decompose tracking
task into two branches: 1) classifying the target from the
background, and 2) regressing the accurate bounding box
based with reference to anchor boxes mounted on different
positions. As is shown on the VOT benchmarks [19, 20],
SiamRPN achieves higher precision on bounding box esti-
mation but suffers lower robustness compared with state-of-
the-art methods [8, 9, 26] due to no online model updating.
Furthermore, SiamRPN mounts anchors with different as-
pect ratios on every spatial location of the feature map to
handle possible shape changes, which is redundant in both
computation and storage.
In this paper, we propose a tracking framework which is
composed of two modules: response generation and bound-
ing box regression, where the first component produces a re-
sponse map to indicate the possibility of covering the object
for anchor boxes mounted on sliding-window positions, and
the second part predicts bounding box shifts from the an-
chors to get refined rectangles. Instead of enumerating dif-
ferent aspect ratios of anchors as in SiamRPN, we propose
to use only one sized anchor for each position and adapt
it to shape changes by resizing its corresponding convolu-
tional filter using bilinear interpolation, which saves model
parameters and computing time. To effectively adapt the
tracking model to appearance changes during tracking, we
propose a recurrent model optimization method to learn a
more effective gradient descent that converges the model
update in 1-2 steps, and generalizes better to future frames.
The key idea is to train a neural optimizer that can con-
6718
verge the tracking model to a good solution in a few gradient
steps. During the training phase, the tracking model is first
updated using the neural optimizer, and then it is applied
on future frames to obtain an error signal for minimization.
Under this particular setting, the resulting optimizer con-
verges the tracking classifier significant faster than SGD-
based optimizers, especially for learning the initial tracking
model. In summary, our contributions are:
• We propose a tracking model consisting of resizable
response generator and bounding box regressor, where
only one sized anchor is used on each spatial posi-
tion and its corresponding convolutional filter could be
adapted to shape variations by bilinear interpolation.
• We propose a recurrent neural optimizer, which is
trained in a meta-learning setting, that recurrently up-
dates the tracking model with faster convergence.
• We conduct comprehensive experiments on large scale
datasets including OTB, VOT, LaSOT, GOT10k and
TrackingNet, and our trackers achieve favorable per-
formance compared with the state-of-the-art.
2. Related Work
Visual Tracking Predicting a heat map to indicate the
position of object is commonly used in visual tracking com-
munity [3, 4, 4, 9, 13, 16, 45]. Among them, SiamFC [4]
is one of the most popular methods due to its fast speed
and good performance. However, most response generation
based trackers, including SiamFC, estimate the bounding
box via simple multi-scale search mechanism, which is un-
able to handle aspect ratio changes. To addresses this issue,
recent SiamRPN [23] and its extensions [11, 22, 52] pro-
pose to train a bounding box regressor as in object detec-
tion [37], showing impressive performance. Different from
these algorithms which enumerates a set of predefined an-
chors with different aspect ratios on each spatial position,
we adopt a resizable anchor to adapt the shape variation of
object, which saves model parameters and computing time.
Online model updating is another important module that
SiamFC lacks. Recent works improve SiamFC [4] by intro-
ducing various model updating strategies, including recur-
rent generation of target template filters through a convo-
lutional LSTM [48], a dynamic memory network [49, 50],
where object information is written into and read from an
addressable external memory, and distractor-aware incre-
mental learning [53], which make use of hard-negative tem-
plates around the target to suppress distractors. It should
be noted that all these algorithms essentially achieve model
updating by linearly interpolating old target templates with
the newly generated one, in which the major difference is
how to control the weights when combining them. This is
far from optimal compared with optimization methods us-
ing gradient decent, which minimize the tracking loss di-
rectly to adapt to new target appearances. Instead of using
a Siamese network to build the convolutional filter, other
methods [26, 34, 41] generate the filter by performing gra-
dient decent on the first frame, which could be continuously
optimized during subsequent frames. In particular, [34] pro-
poses to train the initial tracking model in a meta-learning
setting, which shows promising results. However, it still
uses traditional SGD to optimize the tracking model dur-
ing the subsequent frames, which is not effective to adapt to
new appearance and slow in updating the model. In contrast
to these trackers, our off-line learned recurrent neural opti-
mizer applies meta-learning on both the initial model and
model updates, which allows model initialization and up-
dating in only one or two gradient steps, resulting in much
faster runtime speed, and better accuracy.
Learning to Learn. Learning to learn or meta-learning
has a long history [2, 32, 38]. With the recent successes of
applying meta-learning on few-shot classification [31, 36]
and reinforcement learning [12, 39], it has regained atten-
tion. The pioneering work [1] designs an off-line learned
optimizer using gradient decent and shows promising per-
formance compared with traditional optimization methods.
However, it does not generalize well for large numbers of
descent step. To mitigate this problem, [28] proposes sev-
eral training techniques, including parameters scaling and
combination with convex functions to coordinate the learn-
ing process of the optimizer. [46] also addresses this issue
by designing a hierarchical RNN architecture with dynami-
cally adapted input and output scaling. In contrast to other
works that output an increment for each parameter update,
which is prone to overfitting due to different gradient scales,
we instead associate an adaptive learning rate produced by
a recurrent neural network with the computed gradient for
fast convergence of the model update.
3. Proposed Algorithm
Our tracker consists of two main modules: 1) a tracking
model that is resizable to adapt to shape changes; and 2) a
neural optimizer that is in charge of model updating. The
tracking model contains two branches where the response
generation branch determines the presence of target by pre-
dicting a confidence score map and the bounding box re-
gression branch estimates the precise box of the target by
regressing coordinate shifts from the box anchors mounted
on the sliding-window positions. The offline learned neu-
ral optimizer is trained using a meta-learning framework to
online update the tracking model in order to adapt to appear-
ance variations. Note both response generation and bound-
ing box regression are built on the feature map computed
from the backbone CNN network. The whole framework is
briefly illustrated on Fig. 1
3.1. Resizable Tracking Model
Trackers like correlation filter [16] and MetaTracker [34]
initialize a convolutional filter based on the size of object
in the first frame and keep its size fixed during subsequent
frames. This setting is based on the assumption that the as-
6719
CNN
FeatureExtractor TrackingModel Predic7on LabelsFrames
HistoricalFrames
FutureFrame
F1...Fi...FN<latexit sha1_base64="Iw/uSsNywH23SWGIHOpWAZmbMXY=">AAACFHicbZDLSsNAFIZP6q3GW7RLN8EiuAqJCLosCMWVVLAXaEOYTCft0MmFmYkYQl7DrVt9B12JO3HvK/gUTtMutPWHGT7+cw5n5vcTRoW07S+tsrK6tr5R3dS3tnd294z9g46IU45JG8cs5j0fCcJoRNqSSkZ6CSco9Bnp+pPLab17R7igcXQrs4S4IRpFNKAYSWV5Rq3p5U5hWVbTo+WdXxeeUbctu5S5DM4c6o3qxysotTzjezCMcRqSSGKGhOg7diLdHHFJMSOFPkgFSRCeoBHpK4xQSISbl48vzGPlDM0g5upE0izd3xM5CoXIQl91hkiOxWJtav5bG3GUjCm+X9gvgws3p1GSShLh2fogZaaMzWk+5pBygiXLFCDMqfqBiceIIyxVirquq3CcxSiWoXNqOYpvVEpnMFMVDuEITsCBc2jAFbSgDRgyeIQneNYetBftTXuftVa0+UwN/kj7/AE6Ip50</latexit><latexit sha1_base64="KdPjJpIgksbpsOEh7IPl91nmxPg=">AAACFHicbZDLSsNAFIYn9VbjLbZLN6FFcBUSEXRZUIorqWAv0IYwmU7aoZNJmJmIIeQ13LoTfQddiTtx7yv4FE7TLrT1wAwf/38OZ+b3Y0qEtO0vrbSyura+Ud7Ut7Z3dveM/UpHRAlHuI0iGvGeDwWmhOG2JJLiXswxDH2Ku/7kfOp3bzEXJGI3Mo2xG8IRIwFBUCrJM6pNL3Nyy7KaHinu7Cr3jLpt2UWZy+DMod4of7xWLh5rLc/4HgwjlISYSUShEH3HjqWbQS4JojjXB4nAMUQTOMJ9hQyGWLhZ8fjcPFTK0Awirg6TZqH+nshgKEQa+qozhHIsFr2p+K834jAeE3S3sF8GZ25GWJxIzNBsfZBQU0bmNB9zSDhGkqYKIOJE/cBEY8ghkipFXddVOM5iFMvQObYcxdcqpRMwqzI4ADVwBBxwChrgErRAGyCQggfwBJ61e+1Fe9PeZ60lbT5TBX9K+/wB01ufkg==</latexit><latexit sha1_base64="KdPjJpIgksbpsOEh7IPl91nmxPg=">AAACFHicbZDLSsNAFIYn9VbjLbZLN6FFcBUSEXRZUIorqWAv0IYwmU7aoZNJmJmIIeQ13LoTfQddiTtx7yv4FE7TLrT1wAwf/38OZ+b3Y0qEtO0vrbSyura+Ud7Ut7Z3dveM/UpHRAlHuI0iGvGeDwWmhOG2JJLiXswxDH2Ku/7kfOp3bzEXJGI3Mo2xG8IRIwFBUCrJM6pNL3Nyy7KaHinu7Cr3jLpt2UWZy+DMod4of7xWLh5rLc/4HgwjlISYSUShEH3HjqWbQS4JojjXB4nAMUQTOMJ9hQyGWLhZ8fjcPFTK0Awirg6TZqH+nshgKEQa+qozhHIsFr2p+K834jAeE3S3sF8GZ25GWJxIzNBsfZBQU0bmNB9zSDhGkqYKIOJE/cBEY8ghkipFXddVOM5iFMvQObYcxdcqpRMwqzI4ADVwBBxwChrgErRAGyCQggfwBJ61e+1Fe9PeZ60lbT5TBX9K+/wB01ufkg==</latexit><latexit sha1_base64="KdPjJpIgksbpsOEh7IPl91nmxPg=">AAACFHicbZDLSsNAFIYn9VbjLbZLN6FFcBUSEXRZUIorqWAv0IYwmU7aoZNJmJmIIeQ13LoTfQddiTtx7yv4FE7TLrT1wAwf/38OZ+b3Y0qEtO0vrbSyura+Ud7Ut7Z3dveM/UpHRAlHuI0iGvGeDwWmhOG2JJLiXswxDH2Ku/7kfOp3bzEXJGI3Mo2xG8IRIwFBUCrJM6pNL3Nyy7KaHinu7Cr3jLpt2UWZy+DMod4of7xWLh5rLc/4HgwjlISYSUShEH3HjqWbQS4JojjXB4nAMUQTOMJ9hQyGWLhZ8fjcPFTK0Awirg6TZqH+nshgKEQa+qozhHIsFr2p+K834jAeE3S3sF8GZ25GWJxIzNBsfZBQU0bmNB9zSDhGkqYKIOJE/cBEY8ghkipFXddVOM5iFMvQObYcxdcqpRMwqzI4ADVwBBxwChrgErRAGyCQggfwBJ61e+1Fe9PeZ60lbT5TBX9K+/wB01ufkg==</latexit><latexit sha1_base64="D9uc3cTaj14D1y+MAf6rS95qOtM=">AAACFHicbZBNS8MwGMfT+TbrW3VHL8EheCqtCHocCMOTTHAvsJWSZukWlqYlScVS+jW8etXv4E28evcr+CnMuh5084GEH///8/Ak/yBhVCrH+TJqa+sbm1v1bXNnd2//wDo86sk4FZh0ccxiMQiQJIxy0lVUMTJIBEFRwEg/mF3P/f4DEZLG/F5lCfEiNOE0pBgpLflWo+3nbmHbdtun5Z3fFr7VdGynLLgKbgVNUFXHt75H4xinEeEKMyTl0HUS5eVIKIoZKcxRKkmC8AxNyFAjRxGRXl4+voCnWhnDMBb6cAVL9fdEjiIpsyjQnRFSU7nszcV/vYlAyZTix6X9KrzycsqTVBGOF+vDlEEVw3k+cEwFwYplGhAWVP8A4ikSCCudommaOhx3OYpV6J3bruY7p9m6qGKqg2NwAs6ACy5BC9yADugCDDLwDF7Aq/FkvBnvxseitWZUMw3wp4zPH5oYnKM=</latexit>
Features
Feature
UpdateLoss
MetaLossCNN
F (t+δ)<latexit sha1_base64="Yv/iBoLmz3VBDlUhSlZjJhpPFzo=">AAACDXicbZDJSgNBEIZrXOO4RT16GQxCRAgzIujNgCAeI5gFstHT05M06VnorhHDkGfwam7qO3gQxKvP4Cv4FHaWgyb+0PDxVxVV/bux4Apt+8tYWFxaXlnNrJnrG5tb29md3YqKEklZmUYikjWXKCZ4yMrIUbBaLBkJXMGqbu9yVK/eMal4FN5iP2bNgHRC7nNKUFutq1aax+OGxwSSo0E7m7ML9ljWPDhTyF28DYdPAFBqZ78bXkSTgIVIBVGq7tgxNlMikVPBBmYjUSwmtEc6rK4xJAFTzXR89cA61I5n+ZHUL0Rr7P6eSEmgVD9wdWdAsKtmayPz31pHkrjL6f3MfvTPmykP4wRZSCfr/URYGFmjYCyPS0ZR9DUQKrn+gUW7RBKKOj7TNHU4zmwU81A5KTiab+xc8RQmysA+HEAeHDiDIlxDCcpAQcIjPMOL8WC8Gu/Gx6R1wZjO7MEfGZ8/4cGeIQ==</latexit><latexit sha1_base64="43HwIVKPHVy5cor05FNIWrn7Ayc=">AAACDXicbZDLSsNAFIYn9VbjrerSTbAIFaEkIujOgiAuK9gLtGmZTCbt0MkkzJyIJfQZ3Ko7fQcXgrj1GXwFn8JJ24W2/jDw8Z9zOGd+L+ZMgW1/GbmFxaXllfyquba+sblV2N6pqyiRhNZIxCPZ9LCinAlaAwacNmNJcehx2vAGF1m9cUulYpG4gWFM3RD3BAsYwaCtzmUnLcFR26cc8OGoWyjaZXssax6cKRTP3x4zPVW7he+2H5EkpAIIx0q1HDsGN8USGOF0ZLYTRWNMBrhHWxoFDqly0/HVI+tAO74VRFI/AdbY/T2R4lCpYejpzhBDX83WMvPfWk/iuM/I3cx+CM7clIk4ASrIZH2QcAsiKwvG8pmkBPhQAyaS6R9YpI8lJqDjM01Th+PMRjEP9eOyo/naLlZO0ER5tIf2UQk56BRV0BWqohoiSKIH9IxejHvj1Xg3PiatOWM6s4v+yPj8AWmFn+Y=</latexit><latexit sha1_base64="43HwIVKPHVy5cor05FNIWrn7Ayc=">AAACDXicbZDLSsNAFIYn9VbjrerSTbAIFaEkIujOgiAuK9gLtGmZTCbt0MkkzJyIJfQZ3Ko7fQcXgrj1GXwFn8JJ24W2/jDw8Z9zOGd+L+ZMgW1/GbmFxaXllfyquba+sblV2N6pqyiRhNZIxCPZ9LCinAlaAwacNmNJcehx2vAGF1m9cUulYpG4gWFM3RD3BAsYwaCtzmUnLcFR26cc8OGoWyjaZXssax6cKRTP3x4zPVW7he+2H5EkpAIIx0q1HDsGN8USGOF0ZLYTRWNMBrhHWxoFDqly0/HVI+tAO74VRFI/AdbY/T2R4lCpYejpzhBDX83WMvPfWk/iuM/I3cx+CM7clIk4ASrIZH2QcAsiKwvG8pmkBPhQAyaS6R9YpI8lJqDjM01Th+PMRjEP9eOyo/naLlZO0ER5tIf2UQk56BRV0BWqohoiSKIH9IxejHvj1Xg3PiatOWM6s4v+yPj8AWmFn+Y=</latexit><latexit sha1_base64="aH1IhIcmwlASxeWC/Rc4YrEnRbA=">AAACDXicbZDLSsNAFIYn9VbjrerSzWARKkJJRNBlQRCXFewF2rRMJpN26GQSZk7EEvoMbt3qO7gTtz6Dr+BTOG2z0NYfBj7+cw7nzO8ngmtwnC+rsLK6tr5R3LS3tnd290r7B00dp4qyBo1FrNo+0UxwyRrAQbB2ohiJfMFa/uh6Wm89MKV5LO9hnDAvIgPJQ04JGKt308sqcNYNmAByOumXyk7VmQkvg5tDGeWq90vf3SCmacQkUEG07rhOAl5GFHAq2MTuppolhI7IgHUMShIx7WWzqyf4xDgBDmNlngQ8c39PZCTSehz5pjMiMNSLtan5b22gSDLk9HFhP4RXXsZlkgKTdL4+TAWGGE+DwQFXjIIYGyBUcfMDTIdEEQomPtu2TTjuYhTL0DyvuobvnHLtIo+piI7QMaogF12iGrpFddRAFCn0jF7Qq/VkvVnv1se8tWDlM4foj6zPH9ozm1c=</latexit>
λ(t−1)<latexit sha1_base64="vUHryu/+0sr79bTwctpjSKBh6wc=">AAAB+HicbVDLSgNBEOz1GeMjqx69DAYhHgy7EtBjwIvHCOYByRpmJ7PJkNkHM71CXPIlXjwo4tVP8ebfOEn2oIkFA0VVNd1TfiKFRsf5ttbWNza3tgs7xd29/YOSfXjU0nGqGG+yWMaq41PNpYh4EwVK3kkUp6Evedsf38z89iNXWsTRPU4S7oV0GIlAMIpG6tulnjThAX3IKnjhnk/7dtmpOnOQVeLmpAw5Gn37qzeIWRryCJmkWnddJ0EvowoFk3xa7KWaJ5SN6ZB3DY1oyLWXzQ+fkjOjDEgQK/MiJHP190RGQ60noW+SIcWRXvZm4n9eN8Xg2stElKTII7ZYFKSSYExmLZCBUJyhnBhCmRLmVsJGVFGGpquiKcFd/vIqaV1WXcPvauV6La+jACdwChVw4QrqcAsNaAKDFJ7hFd6sJ+vFerc+FtE1K585hj+wPn8AtrOSaQ==</latexit><latexit sha1_base64="vUHryu/+0sr79bTwctpjSKBh6wc=">AAAB+HicbVDLSgNBEOz1GeMjqx69DAYhHgy7EtBjwIvHCOYByRpmJ7PJkNkHM71CXPIlXjwo4tVP8ebfOEn2oIkFA0VVNd1TfiKFRsf5ttbWNza3tgs7xd29/YOSfXjU0nGqGG+yWMaq41PNpYh4EwVK3kkUp6Evedsf38z89iNXWsTRPU4S7oV0GIlAMIpG6tulnjThAX3IKnjhnk/7dtmpOnOQVeLmpAw5Gn37qzeIWRryCJmkWnddJ0EvowoFk3xa7KWaJ5SN6ZB3DY1oyLWXzQ+fkjOjDEgQK/MiJHP190RGQ60noW+SIcWRXvZm4n9eN8Xg2stElKTII7ZYFKSSYExmLZCBUJyhnBhCmRLmVsJGVFGGpquiKcFd/vIqaV1WXcPvauV6La+jACdwChVw4QrqcAsNaAKDFJ7hFd6sJ+vFerc+FtE1K585hj+wPn8AtrOSaQ==</latexit><latexit sha1_base64="vUHryu/+0sr79bTwctpjSKBh6wc=">AAAB+HicbVDLSgNBEOz1GeMjqx69DAYhHgy7EtBjwIvHCOYByRpmJ7PJkNkHM71CXPIlXjwo4tVP8ebfOEn2oIkFA0VVNd1TfiKFRsf5ttbWNza3tgs7xd29/YOSfXjU0nGqGG+yWMaq41PNpYh4EwVK3kkUp6Evedsf38z89iNXWsTRPU4S7oV0GIlAMIpG6tulnjThAX3IKnjhnk/7dtmpOnOQVeLmpAw5Gn37qzeIWRryCJmkWnddJ0EvowoFk3xa7KWaJ5SN6ZB3DY1oyLWXzQ+fkjOjDEgQK/MiJHP190RGQ60noW+SIcWRXvZm4n9eN8Xg2stElKTII7ZYFKSSYExmLZCBUJyhnBhCmRLmVsJGVFGGpquiKcFd/vIqaV1WXcPvauV6La+jACdwChVw4QrqcAsNaAKDFJ7hFd6sJ+vFerc+FtE1K585hj+wPn8AtrOSaQ==</latexit><latexit sha1_base64="vUHryu/+0sr79bTwctpjSKBh6wc=">AAAB+HicbVDLSgNBEOz1GeMjqx69DAYhHgy7EtBjwIvHCOYByRpmJ7PJkNkHM71CXPIlXjwo4tVP8ebfOEn2oIkFA0VVNd1TfiKFRsf5ttbWNza3tgs7xd29/YOSfXjU0nGqGG+yWMaq41PNpYh4EwVK3kkUp6Evedsf38z89iNXWsTRPU4S7oV0GIlAMIpG6tulnjThAX3IKnjhnk/7dtmpOnOQVeLmpAw5Gn37qzeIWRryCJmkWnddJ0EvowoFk3xa7KWaJ5SN6ZB3DY1oyLWXzQ+fkjOjDEgQK/MiJHP190RGQ60noW+SIcWRXvZm4n9eN8Xg2stElKTII7ZYFKSSYExmLZCBUJyhnBhCmRLmVsJGVFGGpquiKcFd/vIqaV1WXcPvauV6La+jACdwChVw4QrqcAsNaAKDFJ7hFd6sJ+vFerc+FtE1K585hj+wPn8AtrOSaQ==</latexit> Oθ(t−1)`
(t−1)<latexit sha1_base64="QDkHdQtreXDgf8gSrtR7NBxjB2Q=">AAACLHicbZDBShxBEIZrTIw6mmTVo5dGERSJzARBj4IXL4KBrArOutT01u429vQM3TXqMuxD+CJec9V38CIScvPgU9i7ayBZ/aHh67+qKOpPC60cR9FDMPHh4+SnqemZcHbu85evtfmFI5eXVlJd5jq3Jyk60spQnRVrOiksYZZqOk7P9wb14wuyTuXmJ/cKamTYMaqtJLK3mrWNhK1C09HUyi9Ns0q4S4xn1Rp/i9f7fZGQ1n9/zdpKtBkNJd5C/Aoruwt/Dm4B4LBZe05auSwzMiw1OncaRwU3KrSspKZ+mJSOCpTn2KFTjwYzco1qeFRfrHqnJdq59c+wGLr/TlSYOdfLUt+ZIXfdeG1gvlvrWCy6Sl6N7ef2TqNSpiiZjBytb5dacC4GuYmWsiRZ9zygtMpfIGQXLUr26YZh6MOJx6N4C0ffN2PPP3xKWzDSNCzBMqxBDNuwC/twCHWQcA2/4BbugpvgPngMfo9aJ4LXmUX4T8HTC6P5qdQ=</latexit><latexit sha1_base64="WA17lfy4ZYeBqYSk9OqN0NfLY/8=">AAACLHicbZDPSxtBFMdnrW3j9leMRy+DoZBSGnaLYI+CFy9CBBMFNw1vJy/JkNnZZeZta1jyR/iPeO1J0P/Bi4h48+DZg0cniUIb+4WBz3zfezzeN86UtBQEl97Cq8XXb96Wlvx37z98/FRerrRsmhuBTZGq1BzEYFFJjU2SpPAgMwhJrHA/Hm5N6vu/0FiZ6j0aZdhOoK9lTwogZ3XKXyMyEnRfYTf9rTtFRAMk+FnU6Fv4ZTzmESr1/OuUq0E9mIq/hPAJqpuVm53Th3vb6JTvom4q8gQ1CQXWHoZBRu0CDEmhcOxHucUMxBD6eOhQQ4K2XUyPGvPPzunyXmrc08Sn7t8TBSTWjpLYdSZAAztfm5j/rfUNZAMpjub2U+9Hu5A6ywm1mK3v5YpTyie58a40KEiNHIAw0l3AxQAMCHLp+r7vwgnno3gJre/10PGuS2mdzVRiq2yN1VjINtgm22YN1mSCHbM/7IydeyfehXflXc9aF7ynmRX2j7zbRwQ8rDA=</latexit><latexit sha1_base64="WA17lfy4ZYeBqYSk9OqN0NfLY/8=">AAACLHicbZDPSxtBFMdnrW3j9leMRy+DoZBSGnaLYI+CFy9CBBMFNw1vJy/JkNnZZeZta1jyR/iPeO1J0P/Bi4h48+DZg0cniUIb+4WBz3zfezzeN86UtBQEl97Cq8XXb96Wlvx37z98/FRerrRsmhuBTZGq1BzEYFFJjU2SpPAgMwhJrHA/Hm5N6vu/0FiZ6j0aZdhOoK9lTwogZ3XKXyMyEnRfYTf9rTtFRAMk+FnU6Fv4ZTzmESr1/OuUq0E9mIq/hPAJqpuVm53Th3vb6JTvom4q8gQ1CQXWHoZBRu0CDEmhcOxHucUMxBD6eOhQQ4K2XUyPGvPPzunyXmrc08Sn7t8TBSTWjpLYdSZAAztfm5j/rfUNZAMpjub2U+9Hu5A6ywm1mK3v5YpTyie58a40KEiNHIAw0l3AxQAMCHLp+r7vwgnno3gJre/10PGuS2mdzVRiq2yN1VjINtgm22YN1mSCHbM/7IydeyfehXflXc9aF7ynmRX2j7zbRwQ8rDA=</latexit><latexit sha1_base64="geybXVntoJ94RQE78X1cic/Gxf4=">AAACLHicbZDPSsNAEMY3/jf+q3r0slgERZREBD0WvHhUsCqYWibbabu42YTdiVpCH8IX8epV38GLiFcPPoXbWkGrHyz89psZhvniTElLQfDijYyOjU9MTk37M7Nz8wulxaVTm+ZGYFWkKjXnMVhUUmOVJCk8zwxCEis8i68OevWzazRWpvqEOhnWEmhp2ZQCyFn10mZERoJuKWykN7peRNRGgstinbbCjW6XR6jU969eKgfbQV/8L4QDKLOBjuqlj6iRijxBTUKBtRdhkFGtAENSKOz6UW4xA3EFLbxwqCFBWyv6R3X5mnMavJka9zTxvvtzooDE2k4Su84EqG2Haz3z31rLQNaW4nZoPzX3a4XUWU6oxdf6Zq44pbyXG29Ig4JUxwEII90FXLTBgCCXru/7LpxwOIq/cLqzHTo+DsqV3UFMU2yFrbJ1FrI9VmGH7IhVmWB37IE9sifv3nv2Xr23r9YRbzCzzH7Je/8EZ2unlg==</latexit>
`(t−1)
<latexit sha1_base64="W6ihsxi3/kHh+E55HOU8erDMllY=">AAACC3icbZDLSsNAFIZPvNZ4q7p0EyxCXVgSEXRnwY3LCvYCTSyT6aQdOpkMMxOxhD6CW6Er+w6uFLc+hK/gUzi9LLT1h4GP/5zDOfOHglGlXffLWlpeWV1bz23Ym1vbO7v5vf2aSlKJSRUnLJGNECnCKCdVTTUjDSEJikNG6mHvelyvPxCpaMLvdF+QIEYdTiOKkTaW7xPG7rOiPvVOBq18wS25EzmL4M2gcPU2HI4AoNLKf/vtBKcx4RozpFTTc4UOMiQ1xYwMbD9VRCDcQx3SNMhRTFSQTW4eOMfGaTtRIs3j2pm4vycyFCvVj0PTGSPdVfO1sflvrSOR6FL8OLdfR5dBRrlINeF4uj5KmaMTZxyL06aSYM36BhCW1PzAwV0kEdYmPNu2TTjefBSLUDsreYZv3UL5HKbKwSEcQRE8uIAy3EAFqoBBwDO8wMh6sl6td+tj2rpkzWYO4I+szx8oNZ0t</latexit><latexit sha1_base64="XEfYaGis80TgpD023ksJHwPD+NI=">AAACC3icbZDLSsNAFIYn9VbjrerSTbAIdWFJRNCdBTcuK9gLNLFMptN26GQyzJyIJfQR3Aq60Xdwpbj1IXwFn8JJ24W2/jDw8Z9zOGf+UHKmwXW/rNzC4tLySn7VXlvf2NwqbO/UdZwoQmsk5rFqhlhTzgStAQNOm1JRHIWcNsLBRVZv3FKlWSyuYShpEOGeYF1GMBjL9ynnN2kJjrzDUbtQdMvuWM48eFMonr89ZnqqtgvfficmSUQFEI61bnmuhCDFChjhdGT7iaYSkwHu0ZZBgSOqg3R888g5ME7H6cbKPAHO2P09keJI62EUms4IQ1/P1jLz31pPYdln5G5mP3TPgpQJmQAVZLK+m3AHYieLxekwRQnwoQFMFDM/cEgfK0zAhGfbtgnHm41iHurHZc/wlVusnKCJ8mgP7aMS8tApqqBLVEU1RJBED+gZvVj31qv1bn1MWnPWdGYX/ZH1+QOv6p7y</latexit><latexit sha1_base64="XEfYaGis80TgpD023ksJHwPD+NI=">AAACC3icbZDLSsNAFIYn9VbjrerSTbAIdWFJRNCdBTcuK9gLNLFMptN26GQyzJyIJfQR3Aq60Xdwpbj1IXwFn8JJ24W2/jDw8Z9zOGf+UHKmwXW/rNzC4tLySn7VXlvf2NwqbO/UdZwoQmsk5rFqhlhTzgStAQNOm1JRHIWcNsLBRVZv3FKlWSyuYShpEOGeYF1GMBjL9ynnN2kJjrzDUbtQdMvuWM48eFMonr89ZnqqtgvfficmSUQFEI61bnmuhCDFChjhdGT7iaYSkwHu0ZZBgSOqg3R888g5ME7H6cbKPAHO2P09keJI62EUms4IQ1/P1jLz31pPYdln5G5mP3TPgpQJmQAVZLK+m3AHYieLxekwRQnwoQFMFDM/cEgfK0zAhGfbtgnHm41iHurHZc/wlVusnKCJ8mgP7aMS8tApqqBLVEU1RJBED+gZvVj31qv1bn1MWnPWdGYX/ZH1+QOv6p7y</latexit><latexit sha1_base64="vBTVXnTfr8Xtim14TjhxvGtsS+8=">AAACC3icbZDLSgMxFIYz9VbHW9Wlm2AR6sIyI4IuC25cVrAXaMeSSTNtaCYTkjNiGfoIbt3qO7gTtz6Er+BTmLaz0NYfAh//OYdz8odKcAOe9+UUVlbX1jeKm+7W9s7uXmn/oGmSVFPWoIlIdDskhgkuWQM4CNZWmpE4FKwVjq6n9dYD04Yn8g7GigUxGUgecUrAWt0uE+I+q8CZfzrplcpe1ZsJL4OfQxnlqvdK391+QtOYSaCCGNPxPQVBRjRwKtjE7aaGKUJHZMA6FiWJmQmy2c0TfGKdPo4SbZ8EPHN/T2QkNmYch7YzJjA0i7Wp+W9toIkacvq4sB+iqyDjUqXAJJ2vj1KBIcHTWHCfa0ZBjC0Qqrn9AaZDogkFG57rujYcfzGKZWieV33Lt165dpHHVERH6BhVkI8uUQ3doDpqIIoUekYv6NV5ct6cd+dj3lpw8plD9EfO5w8gp5pj</latexit>
θ(t−1)
<latexit sha1_base64="sAdGJJQs1BsKobsf8KtKOEtPpRA=">AAACHHicbVC7SgNBFJ2Nr7i+opZpFoMQC8NuFLQzYGMZwTwgiWF2dpIMmZ1dZu6KYdnCz9DC1lZLezuxFfwFv8LZJIUmHhjmcM693HuPG3KmwLa/jMzC4tLySnbVXFvf2NzKbe/UVRBJQmsk4IFsulhRzgStAQNOm6Gk2Hc5bbjD89Rv3FCpWCCuYBTSjo/7gvUYwaClbi7fdgPuqZGvv7gNAwo4uY6LcOgcJN1cwS7ZY1jzxJmSwtnrfYqHajf33fYCEvlUAOFYqZZjh9CJsQRGOE3MdqRoiMkQ92lLU4F9qjrx+IjE2teKZ/UCqZ8Aa6z+7oixr9I9daWPYaBmvVT81+tLHA4YuZ2ZD73TTsxEGAEVZDK+F3ELAivNyfKYpAT4SBNMJNMXWGSAJSag0zRNU4fjzEYxT+rlknNUKl/ahcoxmiCL8mgPFZGDTlAFXaAqqiGC7tATekYvxqPxZrwbH5PSjDHt2UV/YHz+ABujpiE=</latexit>
M1...Mi...MN<latexit sha1_base64="DZppVC5zQtoROwq9+wJfxnIUdQU=">AAAB/HicbZDLSsNAFIZP6q3WW7RLN8EiuAqJFHRZcOOmUsFeoA1hMp20QyeTMDMRQqiv4saFIm59EHe+jdM0C209MMPH/5/DnPmDhFGpHOfbqGxsbm3vVHdre/sHh0fm8UlPxqnApItjFotBgCRhlJOuooqRQSIIigJG+sHsZuH3H4mQNOYPKkuIF6EJpyHFSGnJN+ttP3fntm23fVrc+d3cNxuO7RRlrYNbQgPK6vjm12gc4zQiXGGGpBy6TqK8HAlFMSPz2iiVJEF4hiZkqJGjiEgvL5afW+daGVthLPThyirU3xM5iqTMokB3RkhN5aq3EP/zhqkKr72c8iRVhOPlQ2HKLBVbiySsMRUEK5ZpQFhQvauFp0ggrHReNR2Cu/rldehd2q7m+2aj1SzjqMIpnMEFuHAFLbiFDnQBQwbP8ApvxpPxYrwbH8vWilHO1OFPGZ8/cgyTSw==</latexit><latexit sha1_base64="DZppVC5zQtoROwq9+wJfxnIUdQU=">AAAB/HicbZDLSsNAFIZP6q3WW7RLN8EiuAqJFHRZcOOmUsFeoA1hMp20QyeTMDMRQqiv4saFIm59EHe+jdM0C209MMPH/5/DnPmDhFGpHOfbqGxsbm3vVHdre/sHh0fm8UlPxqnApItjFotBgCRhlJOuooqRQSIIigJG+sHsZuH3H4mQNOYPKkuIF6EJpyHFSGnJN+ttP3fntm23fVrc+d3cNxuO7RRlrYNbQgPK6vjm12gc4zQiXGGGpBy6TqK8HAlFMSPz2iiVJEF4hiZkqJGjiEgvL5afW+daGVthLPThyirU3xM5iqTMokB3RkhN5aq3EP/zhqkKr72c8iRVhOPlQ2HKLBVbiySsMRUEK5ZpQFhQvauFp0ggrHReNR2Cu/rldehd2q7m+2aj1SzjqMIpnMEFuHAFLbiFDnQBQwbP8ApvxpPxYrwbH8vWilHO1OFPGZ8/cgyTSw==</latexit><latexit sha1_base64="DZppVC5zQtoROwq9+wJfxnIUdQU=">AAAB/HicbZDLSsNAFIZP6q3WW7RLN8EiuAqJFHRZcOOmUsFeoA1hMp20QyeTMDMRQqiv4saFIm59EHe+jdM0C209MMPH/5/DnPmDhFGpHOfbqGxsbm3vVHdre/sHh0fm8UlPxqnApItjFotBgCRhlJOuooqRQSIIigJG+sHsZuH3H4mQNOYPKkuIF6EJpyHFSGnJN+ttP3fntm23fVrc+d3cNxuO7RRlrYNbQgPK6vjm12gc4zQiXGGGpBy6TqK8HAlFMSPz2iiVJEF4hiZkqJGjiEgvL5afW+daGVthLPThyirU3xM5iqTMokB3RkhN5aq3EP/zhqkKr72c8iRVhOPlQ2HKLBVbiySsMRUEK5ZpQFhQvauFp0ggrHReNR2Cu/rldehd2q7m+2aj1SzjqMIpnMEFuHAFLbiFDnQBQwbP8ApvxpPxYrwbH8vWilHO1OFPGZ8/cgyTSw==</latexit><latexit sha1_base64="DZppVC5zQtoROwq9+wJfxnIUdQU=">AAAB/HicbZDLSsNAFIZP6q3WW7RLN8EiuAqJFHRZcOOmUsFeoA1hMp20QyeTMDMRQqiv4saFIm59EHe+jdM0C209MMPH/5/DnPmDhFGpHOfbqGxsbm3vVHdre/sHh0fm8UlPxqnApItjFotBgCRhlJOuooqRQSIIigJG+sHsZuH3H4mQNOYPKkuIF6EJpyHFSGnJN+ttP3fntm23fVrc+d3cNxuO7RRlrYNbQgPK6vjm12gc4zQiXGGGpBy6TqK8HAlFMSPz2iiVJEF4hiZkqJGjiEgvL5afW+daGVthLPThyirU3xM5iqTMokB3RkhN5aq3EP/zhqkKr72c8iRVhOPlQ2HKLBVbiySsMRUEK5ZpQFhQvauFp0ggrHReNR2Cu/rldehd2q7m+2aj1SzjqMIpnMEFuHAFLbiFDnQBQwbP8ApvxpPxYrwbH8vWilHO1OFPGZ8/cgyTSw==</latexit>
M (t+δ)<latexit sha1_base64="tGcr8xJUG1Qso/jg/iNtuP0HC3E=">AAAB9XicbZDLSgNBEEVr4ivGV9Slm8YgRIQwIwFdBty4ESKYByST0NPTSZr0POiuUcKQ/3DjQhG3/os7/8ZOMgtNvNBwuFVFVV8vlkKjbX9bubX1jc2t/HZhZ3dv/6B4eNTUUaIYb7BIRqrtUc2lCHkDBUrejhWngSd5yxvfzOqtR660iMIHnMTcDegwFAPBKBqrd9dLy3jR9blEej7tF0t2xZ6LrIKTQQky1fvFr64fsSTgITJJte44doxuShUKJvm00E00jykb0yHvGAxpwLWbzq+ekjPj+GQQKfNCJHP390RKA60ngWc6A4ojvVybmf/VOgkOrt1UhHGCPGSLRYNEEozILALiC8UZyokBypQwtxI2oooyNEEVTAjO8pdXoXlZcQzfV0u1ahZHHk7gFMrgwBXU4Bbq0AAGCp7hFd6sJ+vFerc+Fq05K5s5hj+yPn8AtaKR8Q==</latexit><latexit sha1_base64="tGcr8xJUG1Qso/jg/iNtuP0HC3E=">AAAB9XicbZDLSgNBEEVr4ivGV9Slm8YgRIQwIwFdBty4ESKYByST0NPTSZr0POiuUcKQ/3DjQhG3/os7/8ZOMgtNvNBwuFVFVV8vlkKjbX9bubX1jc2t/HZhZ3dv/6B4eNTUUaIYb7BIRqrtUc2lCHkDBUrejhWngSd5yxvfzOqtR660iMIHnMTcDegwFAPBKBqrd9dLy3jR9blEej7tF0t2xZ6LrIKTQQky1fvFr64fsSTgITJJte44doxuShUKJvm00E00jykb0yHvGAxpwLWbzq+ekjPj+GQQKfNCJHP390RKA60ngWc6A4ojvVybmf/VOgkOrt1UhHGCPGSLRYNEEozILALiC8UZyokBypQwtxI2oooyNEEVTAjO8pdXoXlZcQzfV0u1ahZHHk7gFMrgwBXU4Bbq0AAGCp7hFd6sJ+vFerc+Fq05K5s5hj+yPn8AtaKR8Q==</latexit><latexit sha1_base64="tGcr8xJUG1Qso/jg/iNtuP0HC3E=">AAAB9XicbZDLSgNBEEVr4ivGV9Slm8YgRIQwIwFdBty4ESKYByST0NPTSZr0POiuUcKQ/3DjQhG3/os7/8ZOMgtNvNBwuFVFVV8vlkKjbX9bubX1jc2t/HZhZ3dv/6B4eNTUUaIYb7BIRqrtUc2lCHkDBUrejhWngSd5yxvfzOqtR660iMIHnMTcDegwFAPBKBqrd9dLy3jR9blEej7tF0t2xZ6LrIKTQQky1fvFr64fsSTgITJJte44doxuShUKJvm00E00jykb0yHvGAxpwLWbzq+ekjPj+GQQKfNCJHP390RKA60ngWc6A4ojvVybmf/VOgkOrt1UhHGCPGSLRYNEEozILALiC8UZyokBypQwtxI2oooyNEEVTAjO8pdXoXlZcQzfV0u1ahZHHk7gFMrgwBXU4Bbq0AAGCp7hFd6sJ+vFerc+Fq05K5s5hj+yPn8AtaKR8Q==</latexit><latexit sha1_base64="tGcr8xJUG1Qso/jg/iNtuP0HC3E=">AAAB9XicbZDLSgNBEEVr4ivGV9Slm8YgRIQwIwFdBty4ESKYByST0NPTSZr0POiuUcKQ/3DjQhG3/os7/8ZOMgtNvNBwuFVFVV8vlkKjbX9bubX1jc2t/HZhZ3dv/6B4eNTUUaIYb7BIRqrtUc2lCHkDBUrejhWngSd5yxvfzOqtR660iMIHnMTcDegwFAPBKBqrd9dLy3jR9blEej7tF0t2xZ6LrIKTQQky1fvFr64fsSTgITJJte44doxuShUKJvm00E00jykb0yHvGAxpwLWbzq+ekjPj+GQQKfNCJHP390RKA60ngWc6A4ojvVybmf/VOgkOrt1UhHGCPGSLRYNEEozILALiC8UZyokBypQwtxI2oooyNEEVTAjO8pdXoXlZcQzfV0u1ahZHHk7gFMrgwBXU4Bbq0AAGCp7hFd6sJ+vFerc+Fq05K5s5hj+yPn8AtaKR8Q==</latexit>
L(t)<latexit sha1_base64="7xIXRlKgSuNEN3vDuU08k6NdWX0=">AAACEnicbZC7SgNBFIbPeo3rbaOlzWAQYhN2RdDOgI2FRQRzgSSG2clsMmT2wsysGpZ9C5sUgpU+g3Zi6wv4Cj6Fs0kKTfxh4OM/53DO/G7EmVS2/WUsLC4tr6zm1sz1jc2tbSu/U5NhLAitkpCHouFiSTkLaFUxxWkjEhT7Lqd1d3Ce1eu3VEgWBtdqGNG2j3sB8xjBSlsdK9/yseoTzJPL9CYpqsO0YxXskj0WmgdnCoWz19HoCQAqHeu71Q1J7NNAEY6lbDp2pNoJFooRTlOzFUsaYTLAPdrUGGCfynYyPj1FB9rpIi8U+gUKjd3fEwn2pRz6ru7MDpWztcz8t9YTOOozcj+zX3mn7YQFUaxoQCbrvZgjFaIsHdRlghLFhxowEUz/AJE+FpgonaFpmjocZzaKeagdlRzNV3ahfAwT5WAP9qEIDpxAGS6gAlUgcAeP8AwvxoPxZrwbH5PWBWM6swt/ZHz+AIWeoBM=</latexit><latexit sha1_base64="R+JHoL0mMOlAjAaH85iEPv3OjzM=">AAACEnicbZC7SgNBFIZn4y2ut42WNoNBiE3YFUE7AzYWFhHMBZI1nJ3MJkNmL8zMqmHJW9go2OozaCe2voCv4FM4m6TQxB8GPv5zDufM78WcSWXbX0ZuYXFpeSW/aq6tb2xuWYXtuowSQWiNRDwSTQ8k5SykNcUUp81YUAg8Thve4CyrN26okCwKr9Qwpm4AvZD5jIDSVscqtANQfQI8vRhdpyV1MOpYRbtsj4XnwZlC8fT1IdNjtWN9t7sRSQIaKsJBypZjx8pNQShGOB2Z7UTSGMgAerSlMYSASjcdnz7C+9rpYj8S+oUKj93fEykEUg4DT3dmh8rZWmb+W+sJiPuM3M3sV/6Jm7IwThQNyWS9n3CsIpylg7tMUKL4UAMQwfQPMOmDAKJ0hqZp6nCc2SjmoX5YdjRf2sXKEZooj3bRHiohBx2jCjpHVVRDBN2iJ/SMXox74814Nz4mrTljOrOD/sj4/AENYqHY</latexit><latexit sha1_base64="R+JHoL0mMOlAjAaH85iEPv3OjzM=">AAACEnicbZC7SgNBFIZn4y2ut42WNoNBiE3YFUE7AzYWFhHMBZI1nJ3MJkNmL8zMqmHJW9go2OozaCe2voCv4FM4m6TQxB8GPv5zDufM78WcSWXbX0ZuYXFpeSW/aq6tb2xuWYXtuowSQWiNRDwSTQ8k5SykNcUUp81YUAg8Thve4CyrN26okCwKr9Qwpm4AvZD5jIDSVscqtANQfQI8vRhdpyV1MOpYRbtsj4XnwZlC8fT1IdNjtWN9t7sRSQIaKsJBypZjx8pNQShGOB2Z7UTSGMgAerSlMYSASjcdnz7C+9rpYj8S+oUKj93fEykEUg4DT3dmh8rZWmb+W+sJiPuM3M3sV/6Jm7IwThQNyWS9n3CsIpylg7tMUKL4UAMQwfQPMOmDAKJ0hqZp6nCc2SjmoX5YdjRf2sXKEZooj3bRHiohBx2jCjpHVVRDBN2iJ/SMXox74814Nz4mrTljOrOD/sj4/AENYqHY</latexit><latexit sha1_base64="OmSWiXjHeNBLDFDcWHcEzdgnuAs=">AAACEnicbZDLSsNAFIYn9VbjLdWlm2AR6qYkIuiy4MaFiwr2Am0sk+mkHTqZhJkTtYS8hVu3+g7uxK0v4Cv4FE7aLLT1h4GP/5zDOfP7MWcKHOfLKK2srq1vlDfNre2d3T2rst9WUSIJbZGIR7LrY0U5E7QFDDjtxpLi0Oe0408u83rnnkrFInEL05h6IR4JFjCCQVsDq9IPMYwJ5ul1dpfW4CQbWFWn7sxkL4NbQBUVag6s7/4wIklIBRCOleq5TgxeiiUwwmlm9hNFY0wmeER7GgUOqfLS2emZfaydoR1EUj8B9sz9PZHiUKlp6OvO/FC1WMvNf2sjieMxI48L+yG48FIm4gSoIPP1QcJtiOw8HXvIJCXApxowkUz/wCZjLDEBnaFpmjocdzGKZWif1l3NN061cVbEVEaH6AjVkIvOUQNdoSZqIYIe0DN6Qa/Gk/FmvBsf89aSUcwcoD8yPn8AfhCdSQ==</latexit>
O<latexit sha1_base64="Y4wicZWudnH8+W1Zp2SabcuIaF4=">AAAB8nicbVDLSsNAFL2pr1pfVZdugkVwVRIp6LLgxp0V7APaUCbTSTt0MhNmboQS+hluXCji1q9x5984abPQ1gMDh3PuZc49YSK4Qc/7dkobm1vbO+Xdyt7+weFR9fikY1SqKWtTJZTuhcQwwSVrI0fBeolmJA4F64bT29zvPjFtuJKPOEtYEJOx5BGnBK3UH8QEJ5SI7H4+rNa8ureAu078gtSgQGtY/RqMFE1jJpEKYkzf9xIMMqKRU8HmlUFqWELolIxZ31JJYmaCbBF57l5YZeRGStsn0V2ovzcyEhszi0M7mUc0q14u/uf1U4xugozLJEUm6fKjKBUuKje/3x1xzSiKmSWEam6zunRCNKFoW6rYEvzVk9dJ56ruW/7QqDUbRR1lOINzuAQfrqEJd9CCNlBQ8Ayv8Oag8+K8Ox/L0ZJT7JzCHzifP4FgkVc=</latexit><latexit sha1_base64="Y4wicZWudnH8+W1Zp2SabcuIaF4=">AAAB8nicbVDLSsNAFL2pr1pfVZdugkVwVRIp6LLgxp0V7APaUCbTSTt0MhNmboQS+hluXCji1q9x5984abPQ1gMDh3PuZc49YSK4Qc/7dkobm1vbO+Xdyt7+weFR9fikY1SqKWtTJZTuhcQwwSVrI0fBeolmJA4F64bT29zvPjFtuJKPOEtYEJOx5BGnBK3UH8QEJ5SI7H4+rNa8ureAu078gtSgQGtY/RqMFE1jJpEKYkzf9xIMMqKRU8HmlUFqWELolIxZ31JJYmaCbBF57l5YZeRGStsn0V2ovzcyEhszi0M7mUc0q14u/uf1U4xugozLJEUm6fKjKBUuKje/3x1xzSiKmSWEam6zunRCNKFoW6rYEvzVk9dJ56ruW/7QqDUbRR1lOINzuAQfrqEJd9CCNlBQ8Ayv8Oag8+K8Ox/L0ZJT7JzCHzifP4FgkVc=</latexit><latexit sha1_base64="Y4wicZWudnH8+W1Zp2SabcuIaF4=">AAAB8nicbVDLSsNAFL2pr1pfVZdugkVwVRIp6LLgxp0V7APaUCbTSTt0MhNmboQS+hluXCji1q9x5984abPQ1gMDh3PuZc49YSK4Qc/7dkobm1vbO+Xdyt7+weFR9fikY1SqKWtTJZTuhcQwwSVrI0fBeolmJA4F64bT29zvPjFtuJKPOEtYEJOx5BGnBK3UH8QEJ5SI7H4+rNa8ureAu078gtSgQGtY/RqMFE1jJpEKYkzf9xIMMqKRU8HmlUFqWELolIxZ31JJYmaCbBF57l5YZeRGStsn0V2ovzcyEhszi0M7mUc0q14u/uf1U4xugozLJEUm6fKjKBUuKje/3x1xzSiKmSWEam6zunRCNKFoW6rYEvzVk9dJ56ruW/7QqDUbRR1lOINzuAQfrqEJd9CCNlBQ8Ayv8Oag8+K8Ox/L0ZJT7JzCHzifP4FgkVc=</latexit><latexit sha1_base64="Y4wicZWudnH8+W1Zp2SabcuIaF4=">AAAB8nicbVDLSsNAFL2pr1pfVZdugkVwVRIp6LLgxp0V7APaUCbTSTt0MhNmboQS+hluXCji1q9x5984abPQ1gMDh3PuZc49YSK4Qc/7dkobm1vbO+Xdyt7+weFR9fikY1SqKWtTJZTuhcQwwSVrI0fBeolmJA4F64bT29zvPjFtuJKPOEtYEJOx5BGnBK3UH8QEJ5SI7H4+rNa8ureAu078gtSgQGtY/RqMFE1jJpEKYkzf9xIMMqKRU8HmlUFqWELolIxZ31JJYmaCbBF57l5YZeRGStsn0V2ovzcyEhszi0M7mUc0q14u/uf1U4xugozLJEUm6fKjKBUuKje/3x1xzSiKmSWEam6zunRCNKFoW6rYEvzVk9dJ56ruW/7QqDUbRR1lOINzuAQfrqEJd9CCNlBQ8Ayv8Oag8+K8Ox/L0ZJT7JzCHzifP4FgkVc=</latexit> -
I(t−1)<latexit sha1_base64="L8gB/Gk9vOxvupRKf5MaugIo/pk=">AAAB/HicbVDLSsNAFL2pr1pf0S7dDBahLiyJFHRZcKO7CvYBbSyT6aQdOnkwMxFCiL/ixoUibv0Qd/6NkzYLbT0wcDjnXu6Z40acSWVZ30ZpbX1jc6u8XdnZ3ds/MA+PujKMBaEdEvJQ9F0sKWcB7SimOO1HgmLf5bTnzq5zv/dIhWRhcK+SiDo+ngTMYwQrLY3M6tDHakowT2+zh7Suzu2zbGTWrIY1B1oldkFqUKA9Mr+G45DEPg0U4VjKgW1FykmxUIxwmlWGsaQRJjM8oQNNA+xT6aTz8Bk61coYeaHQL1Borv7eSLEvZeK7ejKPKpe9XPzPG8TKu3JSFkSxogFZHPJijlSI8ibQmAlKFE80wUQwnRWRKRaYKN1XRZdgL395lXQvGrbmd81aq1nUUYZjOIE62HAJLbiBNnSAQALP8ApvxpPxYrwbH4vRklHsVOEPjM8fBP+USw==</latexit><latexit sha1_base64="L8gB/Gk9vOxvupRKf5MaugIo/pk=">AAAB/HicbVDLSsNAFL2pr1pf0S7dDBahLiyJFHRZcKO7CvYBbSyT6aQdOnkwMxFCiL/ixoUibv0Qd/6NkzYLbT0wcDjnXu6Z40acSWVZ30ZpbX1jc6u8XdnZ3ds/MA+PujKMBaEdEvJQ9F0sKWcB7SimOO1HgmLf5bTnzq5zv/dIhWRhcK+SiDo+ngTMYwQrLY3M6tDHakowT2+zh7Suzu2zbGTWrIY1B1oldkFqUKA9Mr+G45DEPg0U4VjKgW1FykmxUIxwmlWGsaQRJjM8oQNNA+xT6aTz8Bk61coYeaHQL1Borv7eSLEvZeK7ejKPKpe9XPzPG8TKu3JSFkSxogFZHPJijlSI8ibQmAlKFE80wUQwnRWRKRaYKN1XRZdgL395lXQvGrbmd81aq1nUUYZjOIE62HAJLbiBNnSAQALP8ApvxpPxYrwbH4vRklHsVOEPjM8fBP+USw==</latexit><latexit sha1_base64="L8gB/Gk9vOxvupRKf5MaugIo/pk=">AAAB/HicbVDLSsNAFL2pr1pf0S7dDBahLiyJFHRZcKO7CvYBbSyT6aQdOnkwMxFCiL/ixoUibv0Qd/6NkzYLbT0wcDjnXu6Z40acSWVZ30ZpbX1jc6u8XdnZ3ds/MA+PujKMBaEdEvJQ9F0sKWcB7SimOO1HgmLf5bTnzq5zv/dIhWRhcK+SiDo+ngTMYwQrLY3M6tDHakowT2+zh7Suzu2zbGTWrIY1B1oldkFqUKA9Mr+G45DEPg0U4VjKgW1FykmxUIxwmlWGsaQRJjM8oQNNA+xT6aTz8Bk61coYeaHQL1Borv7eSLEvZeK7ejKPKpe9XPzPG8TKu3JSFkSxogFZHPJijlSI8ibQmAlKFE80wUQwnRWRKRaYKN1XRZdgL395lXQvGrbmd81aq1nUUYZjOIE62HAJLbiBNnSAQALP8ApvxpPxYrwbH4vRklHsVOEPjM8fBP+USw==</latexit><latexit sha1_base64="L8gB/Gk9vOxvupRKf5MaugIo/pk=">AAAB/HicbVDLSsNAFL2pr1pf0S7dDBahLiyJFHRZcKO7CvYBbSyT6aQdOnkwMxFCiL/ixoUibv0Qd/6NkzYLbT0wcDjnXu6Z40acSWVZ30ZpbX1jc6u8XdnZ3ds/MA+PujKMBaEdEvJQ9F0sKWcB7SimOO1HgmLf5bTnzq5zv/dIhWRhcK+SiDo+ngTMYwQrLY3M6tDHakowT2+zh7Suzu2zbGTWrIY1B1oldkFqUKA9Mr+G45DEPg0U4VjKgW1FykmxUIxwmlWGsaQRJjM8oQNNA+xT6aTz8Bk61coYeaHQL1Borv7eSLEvZeK7ejKPKpe9XPzPG8TKu3JSFkSxogFZHPJijlSI8ibQmAlKFE80wUQwnRWRKRaYKN1XRZdgL395lXQvGrbmd81aq1nUUYZjOIE62HAJLbiBNnSAQALP8ApvxpPxYrwbH4vRklHsVOEPjM8fBP+USw==</latexit>
θ(t−1)
cf<latexit sha1_base64="iGk2FQX2DP+xkByTneTfX/Rg/WY=">AAACEXicbVDLSsNAFJ34rPVVdekmWIS6sCS1oMuCG5cV7AOaWiaTm3bo5MHMjVBCfsGNv+LGhSJu3bnzb5y0XWjrgWEO59zLvfe4seAKLevbWFldW9/YLGwVt3d29/ZLB4dtFSWSQYtFIpJdlyoQPIQWchTQjSXQwBXQccfXud95AKl4FN7hJIZ+QIch9zmjqKVBqeIgFx6kjhsJT00C/aUOjgBplt2nFTy3z7JByvxsUCpbVWsKc5nYc1ImczQHpS/Hi1gSQIhMUKV6thVjP6USOROQFZ1EQUzZmA6hp2lIA1D9dHpRZp5qxTP9SOoXojlVf3ekNFD5troyoDhSi14u/uf1EvSv+ikP4wQhZLNBfiJMjMw8HtPjEhiKiSaUSa53NdmISspQh1jUIdiLJy+Tdq1qX1Rrt/Vyoz6Po0COyQmpEJtckga5IU3SIow8kmfySt6MJ+PFeDc+ZqUrxrzniPyB8fkDx6WeLQ==</latexit>
θ(t−1)
reg<latexit sha1_base64="auZkbbStlDiT/uc/2LdiR3fNzfk=">AAACEnicbVC7SgNBFJ2Nrxhfq5Y2i0FICsNuDGgZsLGMYB6QjWF29iYZMvtg5q4Qlv0GG3/FxkIRWys7/8bJo9DEA8MczrmXe+/xYsEV2va3kVtb39jcym8Xdnb39g/Mw6OWihLJoMkiEcmORxUIHkITOQroxBJo4Aloe+Prqd9+AKl4FN7hJIZeQIchH3BGUUt9s+wiFz6krhcJX00C/aUujgBplt2nJTx3ylk/lTDM+mbRrtgzWKvEWZAiWaDRN79cP2JJACEyQZXqOnaMvZRK5ExAVnATBTFlYzqErqYhDUD10tlJmXWmFd8aRFK/EK2Z+rsjpYGarqsrA4ojtexNxf+8boKDq17KwzhBCNl80CARFkbWNB/L5xIYiokmlEmud7XYiErKUKdY0CE4yyevkla14lxUqre1Yr22iCNPTsgpKRGHXJI6uSEN0iSMPJJn8krejCfjxXg3PualOWPRc0z+wPj8Aa5qnqw=</latexit>
θ(t)
cf<latexit sha1_base64="yAmHVJ8MR8yI9uRz1m1aXRF90vc=">AAACD3icbVDLSsNAFJ34rPVVdekmWJS6KUkt6LLgxmUF+4Cmlsnkph06eTBzI5SQP3Djr7hxoYhbt+78GydtF9p6YJjDOfdy7z1uLLhCy/o2VlbX1jc2C1vF7Z3dvf3SwWFbRYlk0GKRiGTXpQoED6GFHAV0Ywk0cAV03PF17nceQCoehXc4iaEf0GHIfc4oamlQOnOQCw9Sx42EpyaB/lIHR4A0y+7TCp5ng5T52aBUtqrWFOYyseekTOZoDkpfjhexJIAQmaBK9Wwrxn5KJXImICs6iYKYsjEdQk/TkAag+un0nsw81Ypn+pHUL0Rzqv7uSGmg8l11ZUBxpBa9XPzP6yXoX/VTHsYJQshmg/xEmBiZeTimxyUwFBNNKJNc72qyEZWUoY6wqEOwF09eJu1a1b6o1m7r5UZ9HkeBHJMTUiE2uSQNckOapEUYeSTP5JW8GU/Gi/FufMxKV4x5zxH5A+PzB9ITnbs=</latexit>
θ(t)
reg<latexit sha1_base64="7PyvvU6/7dmZWCqBgep8/7HuBgI=">AAACEHicbVDLSsNAFJ34rPVVdekmWMS6KUkt6LLgxmUF+4Amlsnkth06eTBzI5SQT3Djr7hxoYhbl+78G6dtFtp6YJjDOfdy7z1eLLhCy/o2VlbX1jc2C1vF7Z3dvf3SwWFbRYlk0GKRiGTXowoED6GFHAV0Ywk08AR0vPH11O88gFQ8Cu9wEoMb0GHIB5xR1FK/dOYgFz6kjhcJX00C/aUOjgBplt2nFTzP+qmEYdYvla2qNYO5TOyclEmOZr/05fgRSwIIkQmqVM+2YnRTKpEzAVnRSRTElI3pEHqahjQA5aazgzLzVCu+OYikfiGaM/V3R0oDNV1WVwYUR2rRm4r/eb0EB1duysM4QQjZfNAgESZG5jQd0+cSGIqJJpRJrnc12YhKylBnWNQh2IsnL5N2rWpfVGu39XKjnsdRIMfkhFSITS5Jg9yQJmkRRh7JM3klb8aT8WK8Gx/z0hUj7zkif2B8/gC4dp46</latexit>
θ(t)
<latexit sha1_base64="nakvMEc7d7eXvAzbtTwA54NYxrg=">AAACAnicbVDLSsNAFJ34rPUVdSVuBotQNyWpBV0W3LisYB/QxjKZTNqhkwczN0IJwY2/4saFIm79Cnf+jZM2C209MMzhnHu59x43FlyBZX0bK6tr6xubpa3y9s7u3r55cNhRUSIpa9NIRLLnEsUED1kbOAjWiyUjgStY151c5373gUnFo/AOpjFzAjIKuc8pAS0NzeOBGwlPTQP9pQMYMyDZfVqF82xoVqyaNQNeJnZBKqhAa2h+DbyIJgELgQqiVN+2YnBSIoFTwbLyIFEsJnRCRqyvaUgCppx0dkKGz7TiYT+S+oWAZ+rvjpQEKt9SVwYExmrRy8X/vH4C/pWT8jBOgIV0PshPBIYI53lgj0tGQUw1IVRyvSumYyIJBZ1aWYdgL568TDr1mn1Rq982Ks1GEUcJnaBTVEU2ukRNdINaqI0oekTP6BW9GU/Gi/FufMxLV4yi5wj9gfH5A+d+l7M=</latexit>
B1...Bi...BN<latexit sha1_base64="hsoxTdlhufDp6P+GOGImM1f2m3c=">AAAB/HicbVDLSsNAFJ3UV62vaJduBovgKiS1oMtSN66kgn1AG8JkOmmHTiZhZiKEEH/FjQtF3Poh7vwbp2kW2nrgXg7n3MvcOX7MqFS2/W1UNja3tnequ7W9/YPDI/P4pC+jRGDSwxGLxNBHkjDKSU9RxcgwFgSFPiMDf36z8AePREga8QeVxsQN0ZTTgGKktOSZ9Y6XObllWR2PFj27yz2zYVt2AbhOnJI0QImuZ36NJxFOQsIVZkjKkWPHys2QUBQzktfGiSQxwnM0JSNNOQqJdLPi+Byea2UCg0jo4goW6u+NDIVSpqGvJ0OkZnLVW4j/eaNEBdduRnmcKMLx8qEgYVBFcJEEnFBBsGKpJggLqm+FeIYEwkrnVdMhOKtfXif9puVcWs37VqPdKuOoglNwBi6AA65AG9yCLugBDFLwDF7Bm/FkvBjvxsdytGKUO3XwB8bnDz/0ky4=</latexit>
B(t+δ)<latexit sha1_base64="r7SiNohxYD8aMpeFDZsmONt/qn0=">AAAB9XicbVBNS8NAEN3Ur1q/qh69LBahIpSkFvRY9OKxgv2ANi2bzaZdutmE3YlSQv+HFw+KePW/ePPfuG1z0NYHA4/3ZpiZ58WCa7Dtbyu3tr6xuZXfLuzs7u0fFA+PWjpKFGVNGolIdTyimeCSNYGDYJ1YMRJ6grW98e3Mbz8ypXkkH2ASMzckQ8kDTgkYqX/TT8tw0fOZAHI+HRRLdsWeA68SJyMllKExKH71/IgmIZNABdG669gxuClRwKlg00Iv0SwmdEyGrGuoJCHTbjq/eorPjOLjIFKmJOC5+nsiJaHWk9AznSGBkV72ZuJ/XjeB4NpNuYwTYJIuFgWJwBDhWQTY54pREBNDCFXc3IrpiChCwQRVMCE4yy+vkla14lxWqve1Ur2WxZFHJ+gUlZGDrlAd3aEGaiKKFHpGr+jNerJerHfrY9Gas7KZY/QH1ucPpbOR6g==</latexit>
Figure 1: Pipeline of ROAM++. Given a mini-batch of training patches, which are cropped based on the predicted object boxes, deep features are extracted
by Feature Extractor. The fixed-size Tracking Model θ(t−1) is warped to the current target size yielding the warped tracking model θ(t−1), as in (2, 3). The
response map and bounding boxes are then predicted for each sample using θ(t−1), from which the update loss ℓ(t−1) and its gradient ▽θ(t−1)ℓ
(t−1) are
computed using ground truth labels. Next, the element-wise stack I(t−1) consisting of previous learning rates, current parameters, current update loss and
its gradient are input into a coordinate-wise LSTM O to generate the adaptive learning rate λ(t−1) as in (11). The model is then updated using one gradient
descent step (denoted by ⊖) as in (9). Finally, we apply the updated model θ(t) on a randomly selected future frame to get a meta loss for minimization as
in (13).
pect ratio of object does not change during tracking, which
is however often violated. Therefore, dynamically adapting
the convolutional filter to the object shape variations is de-
sirable, which means the number of filter parameters may
vary between frames in the video and among different se-
quences. However, this complicates the design of neural
optimizer when using separate learning rates for each filter
parameter. To simplify the meta-learning framework and
to better allow for per-parameter learning rates, we define
fixed-shape convolutional filters, which are warped to the
desired target size using bilinear interpolation before con-
volving with the feature map. In subsequent frames, the
recurrent optimizer updates the fixed-shape tracking model.
Note that MetaTracker [34] also resizes filters to the object
size for model initialization, However, instead of dynami-
cally adapting the convolutional filters to the object size for
subsequent frames, MetaTracker keep the same shape as the
initial filter during the following frames
Specifically, tracking model θ contains two parts, i.e.
correlation filter θcf and bounding box regression filter
θreg . They are both warped to adapt to the shape variation
of target,
θ = [θcf ,θreg], (1)
θcf =W(θcf , φ), (2)
θreg =W(θreg, φ), (3)
whereW resizes the convolutional filter to size φ = (fr, fc)using bilinear interpolation. The filter size is computed
from the width and height (w, h) of the object in the pre-
vious image patch (and for symmetry the filter size is odd),
fr = ⌈ρhc⌉ − ⌈ρh
c⌉mod 2 + 1, (4)
fc = ⌈ρw
c⌉ − ⌈ρw
c⌉mod 2 + 1 (5)
where ρ is the scale factor to enlarge the filter size to cover
some context information, and c is the stride of feature map,
and ⌈ ⌉means ceiling. Because of the resizable filters, there
is no need to enumerate different aspect ratios and scales
of anchor boxes when performing bounding box regres-
sion. We only use one sized anchor on each spatial location
whose size is corresponding to the shape of regression filter,
(aw, ah) = (fc, fr)/ρ, (6)
This saves regression filter parameters and achieves faster
speed. Note that we update the filter size and its correspond-
ing anchor box every τ frames, i.e. just before every model
updating, during both offline training and testing/tracking
phases. Through this modification, we are able to initial-
ize the tracking model with θ(0) and recurrently optimize
it in subsequent frames without worrying about the shape
changes of the tracked object.
6720
3.2. Recurrent Model Optimization
Traditional optimization methods have the problem of
slow convergence due to small learning rates and limited
training samples, while simply increasing the learning rate
has the risk of the training loss wandering wildly. Instead,
we design a recurrent neural optimizer, which is trained to
converge the model to a good solution in a few gradient
steps1, to update the tracking model. Our key idea is based
on the assumption that the best optimizer should be able to
update the model to generalize well on future frames. Dur-
ing the offline training stage, we perform a one step gradient
update on the tracking model using our recurrent neural op-
timizer and then minimize its loss on future frames. Once
the offline learning stage is finished, we use this learned
neural optimizer to recurrently update the tracking model
to adapt to appearance variations. The optimizer trained in
this way will be able to quickly converge the model update
to generalize well for future frame tracking.
We denote the response generation network as
G(F ;θcf , φ) and the bounding box regression net-
work as R(F ;θreg, φ), where F is the feature map input
and θ are the parameters. The tracking loss consists of two
parts: response loss and regression loss,
L(F,M,B;θ, φ) = ‖G(F ;θcf , φ)−M‖2+‖R(F ;θreg, φ)−B‖s (7)
where the first term is the L2 loss and the second term is
the smooth L1 loss [37], and B represents the ground truth
box. Note we adopt parameterization of bounding box coor-
dinates as in [37]. M is the corresponding label map which
is built using a 2D Gaussian function and the ground-truth
object location (x0, y0) and size (w, h),
M(x, y) = exp(
−α(
(x−x0)2
σ2x
+ (y−y0)2
σ2y
))
(8)
where (σx, σy) = (w/c, h/c), and α controls the shape
of the response map. Note that we use past predictions as
pseudo-labels when performing model updating during test-
ing. We only use the ground truth during offline training.
A typical tracking process updates the tracking model
using historical training examples and then tests this up-
dated model on the following frames until the next update.
We simulate this scenario in a meta-learning paradigm by
recurrently optimizing the tracking model, and then testing
it on a future frame. Specifically, the tracking network is
updated by
θ(t) = θ
(t−1) − λ(t−1) ⊙ ▽
θ(t−1)ℓ(t−1), (9)
where λ(t−1) is a fully element-wise learning rate that
has the same dimension as the tracking model parameters
1We only use one gradient step in our experiment, while considering
multiple steps is straightforward.
θ(t−1), and ⊙ is element-wise multiplication. The learn-
ing rate is recurrently generated using an LSTM with input
consisting of the previous learning rate λ(t−2), the current
parameters θ(t−1), the current update loss ℓ(t−1) and its gra-
dient ▽θ(t−1)ℓ(t−1),
I(t−1) = [λ(t−2),▽θ(t−1)ℓ(t−1),θ(t−1), ℓ(t−1)], (10)
λ(t−1) = σ(O(I(t−1);ω)), (11)
where O(·;ω) is a coordinate-wise LSTM [1] parameter-
ized by ω, which shares the parameters across all dimen-
sions of input, and σ is the sigmoid function to bound the
predicted learning rate. The LSTM input I(t−1) is formed
by element-wise stacking the 4 sub-inputs along a new
axis2. The current update loss ℓ(t−1) is computed from a
mini-batch of n updating samples,
ℓ(t−1) =1
n
n∑
j=1
L(Fj ,Mj , Bj ;θ(t−1), φ(t−1)), (12)
where the updating samples (Fj ,Mj , Bj) are collected
from the previous τ frames, where τ is the frame interval
between model updates during online tracking. Finally, we
test the newly updated model θ(t) on a randomly selected
future frame3 and obtain the meta loss,
L(t) = L(F (t+δ),M (t+δ), B(t+δ);θ(t), φ(t−1)), (13)
where δ is randomly selected within [0, τ − 1].During offline training stage, we perform the aforemen-
tioned procedure on a mini-batch of videos and get an aver-
aged meta loss for optimizing the neural optimizer,
L =1
NT
N∑
i=1
T∑
t=1
L(t)Vi, (14)
where N is the batch size and T is the number of model up-
dates, and Vi ∼ p(V) is a video clip sampled from the train-
ing set. It should be noted that the initial tracking model
parameter θ(0) and initial learning rate λ(0) are also train-
able variables, which are jointly learned with the neural op-
timizer O. By minimizing the averaged meta loss L, we
aim to train a neural optimizer that can update the track-
ing model to generalize well on subsequent frames, as well
as to learn a beneficial initialization of the tracking model,
which is broadly applicable to different tasks (i.e. videos).
The overall training process is detailed in Algorithm 1.
2We therefore get an |θ|×4 matrix, where |θ| is the number of param-
eters in θ. Note that the current update loss ℓ(t−1) is broadcasted to have
compatible shape with other vectors. To better understand this process, we
can treat the input of LSTM as a mini-batch of vectors where |θ| is the
batch size and 4 is the dimension of the input vector.3We found that using more than one future frame does not improve
performance but costs more time during the off-line training phase.
6721
Algorithm 1 Offline training of our framework
Input:
p(V): distribution over training videos.
Output:
θ(0),λ(0) : initial tracking model and learning rates.
ω: recurrent neural optimizer.
1: Initialize all network parameters.
2: while not done do
3: Draw a mini-batch of videos: Vi ∼ p(V)4: for all Vi do
5: Compute θ(1) ← θ
(0) − λ(0) ⊙ ▽
θ(0)ℓ(0) in (9).
6: Compute meta loss L(1) using (13).
7: for t = {1 + τ, 1 + 2τ, · · · , 1 + (T − 1)τ} do
8: Compute adaptive learning rate λ(t−1) using
neural optimizer O as in (11).
9: Compute updated model θ(t) using (9).
10: Compute meta loss L(t) using (13).
11: end for
12: end for
13: Compute averaged meta loss L using (14).
14: Update θ(0),λ(0),ω by computing gradient of L.
15: end while
3.3. Random Filter Scaling
Neural optimizers have difficulty to generalize well on
new tasks due to overfitting as discussed in [1, 28, 46].
By analyzing the learned behavior of the neural optimizer,
we found that our preliminary trained optimizer will pre-
dict similar learning rates (see Fig. 2 left). We attribute
this to overfitting to network inputs with similar magnitude
scales. The following simple example illustrates the over-
fitting problem. Suppose the objective function4 that the
neural optimizer minimizes is g(θ) = ‖xθ − y‖2. The
optimal element-wise learning rate is 1/2x2 since we can
achieve the lowest loss of 0 in one gradient descent step
θ(t+1) = θ
(t) − 1/2x2▽g(θ(t)) = y/x. Note that the opti-
mal learning rate depends on the network input x, and thus
the learned neural optimizer is prone to overfitting if it does
not see enough magnitude variations of x. To address this
problem, we multiply the tracking model θ with a randomly
sampled vector ǫ, which has the same dimension as θ [28]
during each iteration of offline training,
ǫ ∼ exp(U(−κ, κ)), θǫ = ǫ⊙ θ, (15)
where U(−κ, κ) is a uniform distribution in the interval
[−κ, κ], and κ is the range factor to control the scale extent.
The objective function is then modified as gǫ(θ) = g(θǫ).
In this way, the network input x is indirectly scaled without
modifying the training samples (x, y) in practice. Thus, the
learned neural optimizer is forced to predict adaptive learn-
ing rates for inputs with different magnitudes (see Fig. 2
4Our actual loss function in (7) includes an L2 loss and smooth L1 loss;
for simplicity here we consider a simple linear model with a L2 loss.
Without RFS
Predicted Learing Rates
With RFST
rain
ing
Ste
ps
Predicted Learing Rates
Figure 2: Histogram of predicted learning rates during offline training
right), rather than to produce similar learning rates for sim-
ilar scaled inputs, which improves its generalization ability.
4. Online Tracking via Proposed Framework
The offline training produces a neural optimizer O, ini-
tial tracking model θ(0) and learning rate λ(0), which we
then use to perform online tracking. The overall process is
similar to offline training except that we do not compute the
meta loss or its gradient.
Model Initialization. Given the first frame, the ini-
tial image patch is cropped and centered on the provided
groundtruth bounding box. We then augment the initial im-
age patch to 9 samples by stretching the object to different
aspect ratios and scales. Specifically, the target is stretched
by [swr, sh/r], where (w, h) are the initial object width and
height, and (s, r) are scale and aspect ratio factors. Then we
use these examples, as well as the offline learned θ(0) and
λ(0), to perform one-step gradient update to build an initial
model θ(1) as in (9).
Bounding Box Estimation. We estimate the object
bounding box by first finding the presence of target through
the response generation, and then predict accurate box by
bounding box regression. We employ the penalty strategy
used in [23] on the generated response to suppress estimated
boxes with large changes in scale and aspect ratio. In addi-
tion, we also multiply the response map by a Gaussian-like
motion map to suppress large movement. The bounding
box computed by the anchor that corresponds to the max-
imum score of response map is the final prediction. To
smoothen the results, we linearly interpolate this estimated
object size with the previous one. We denote our neural-
optimized tracking model with bounding box regression as
ROAM++. We also design a baseline variant of our tracker,
which uses multi-scale search to the estimate object box in-
stead of bounding box regression, and denote it as ROAM.
Model Updating. We update the model every τ frame.
Although offline training uses the previous τ frames to per-
form a one-step gradient update of the model, in practice,
we find that using more than one step could further improve
the performance during tracking (see Sec. 6.2). Therefore,
we adopt two-step gradient update using the previous 2τframes in our experiments.
6722
5. Implementation Details
Patch Cropping. Given a bounding box of the object
(x0, y0, w, h), the ROI of the image patch has the same cen-
ter (x0, y0) and takes a larger size S = Sw = Sh = γ√wh,
where γ is the ROI scale factor. Then, the ROI is resized to
a fixed size S × S for batch processing.
Network Structure. We use the first 12 convolution lay-
ers of the pretrained VGG-16 [40] as the feature extrac-
tor. The top max-pooling layers are removed to increase
the spatial resolution of the feature map. Both the response
generation network C and bounding box regression network
consists of two convolutional layers with a dimension re-
duction layer of 512×64×1×1 (in-channels, out-channels,
height, width) as the first layer, and either a correlation layer
of 64×1×21×21 or a regression layer of 64×4×21×21 as
the second layer respectively. We use two stacked LSTM
layers with 20 hidden units for the neural optimizer O.
The ROI scale factor is γ = 5 and the search size is
S = 281. The scale factor of filter size is ρ = 1.5. The
response generation uses α = 20 and the feature stride of
the CNN feature extractor is c = 4. The scale and aspect
ratio factors s, r used for initial image patch augmentation
are selected from {0.8, 1, 1.2}, generating a combination of
9 pairs of (s, r). The range factors used in RFS are κcf =1.6, κreg = 1.3 5.
Training Details. We use ADAM [18] optimization with
a mini-batch of 16 video clips of length 31 on 4 GPUs (4
videos per GPU) to train our framework. We use the train-
ing splits of ImageNet VID [21], TrackingNet [30], LaSOT
[10], GOT10k [17], ImageNet DET [21] and COCO [25]
for training. During training, we randomly extract a contin-
uous sequence clip for video datasets, and repeat the same
still image to form a video clip for image datasets. Note we
randomly augment all frames in a training clip by slightly
stretching and scaling the images. We use a learning rate of
1e-6 for the initial regression parameters θ(0) and the ini-
tial learning rate λ(0). For the recurrent neural optimizerO,
we use a learning rate of 1e-3. Both learning rates are mul-
tiplied by 0.5 every 5 epochs. We implement our tracker in
Python with the PyTorch toolbox [35], and conduct the ex-
periments on a computer with an NVIDIA RTX2080 GPU
and Intel(R) Core(TM) i9 CPU @ 3.6 GHz. Our tracker
ROAM and ROAM++ run at 13 FPS and 20 FPS respec-
tively. (See supplementary for detailed speed comparison)
6. Experiments
We evaluate our trackers on six benchmarks: OTB-2015
[47], VOT-2016 [19], VOT-2017 [20], LaSOT [10], GOT-
10k [17] and TrackingNet [30].
5We use different range factor κ for θcf and θreg due to different
magnitude of parameters in the two branches
0 10 20 30 40 50
Location error threshold
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Pre
cis
ion
Precision plots of OPE
SiamRPN+ [0.923]ECO [0.910]ROAM [0.908]DSLT [0.907]ROAM++ [0.904]CCOT [0.898]DaSiamRPN [0.881]MetaTracker [0.856]C-RPN [0.853]SiamRPN [0.851]CREST [0.838]MemTrack [0.820]Staple [0.784]SiamFC [0.771]
0 0.2 0.4 0.6 0.8 1
Overlap threshold
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Su
cce
ss r
ate
Success plots of OPE
ECO [0.691]ROAM [0.681]ROAM++ [0.680]CCOT [0.671]SiamRPN+ [0.665]DaSiamRPN [0.658]DSLT [0.658]C-RPN [0.639]MetaTracker [0.637]SiamRPN [0.637]MemTrack [0.626]CREST [0.623]SiamFC [0.582]Staple [0.581]
Figure 3: Precision and success plot on OTB-2015.
VOT-2016 VOT-2017
EAO(↑) A(↑) R(↓) EAO(↑) A(↑) R(↓)
ROAM++ 0.441 0.599 0.174 0.380 0.543 0.195
ROAM 0.384 0.556 0.183 0.331 0.505 0.226
MetaTracker 0.317 0.519 - - - -
DaSiamRPN 0.411 0.61 0.22 0.326 0.56 0.34
SiamRPN+ 0.37 0.58 0.24 0.30 0.52 0.41
C-RPN 0.363 0.594 - 0.289 - -
SiamRPN 0.344 0.56 0.26 0.244 0.49 0.46
ECO 0.375 0.55 0.20 0.280 0.48 0.27
DSLT 0.343 0.545 0.219 - - -
CCOT 0.331 0.54 0.24 0.267 0.49 0.32
Staple 0.295 0.54 0.38 0.169 0.52 0.69
CREST 0.283 0.51 0.25 - - -
MemTrack 0.272 0.531 0.373 0.248 0.524 0.357
SiamFC 0.235 0.532 0.461 0.188 0.502 0.585
Table 1: Results on VOT-2016/2017. The evaluation metrics are expected
average overlap (EAO), accuracy value (A), robustness value (R). The top
3 performing trackers are colored with red, green and blue respectively.
6.1. Comparison Results with Stateoftheart
We compare our ROAM and ROAM++ with recent re-
sponse generation based trackers including MetaTracker
[34], DSLT [26], MemTrack [49], CREST [41], SiamFC
[4], CCOT [9], ECO [8], Staple [3], as well as recent state-
of-the-art trackers including SiamRPN[23], DaSiamRPN
[53], SiamRPN+ [52] and C-RPN [11] on both OTB and
VOT datasets. For the methods using SGD updates, the
number of SGD steps followed their implementations.
OTB. Fig. 3 presents the experiment results on OTB-
2015 dataset, which contains 100 sequences with 11 an-
notated video attributes. Both our ROAM and ROAM++
achieve similar AUC compared with top performer ECO
and outperform all other trackers. Specifically, our ROAM
and ROAM++ surpass MetaTracker [34], which is the base-
line for meta learning trackers that uses traditional opti-
mization method for model updating, by 6.9% and 6.7%
on the success plot respectively, demonstrating the effec-
tiveness of the proposed recurrent model optimization al-
gorithm and resizable bounding box regression. In addi-
tion, both our ROAM and ROAM++ outperform the very
recent meta learning based tracker MLT [7] by a large mar-
gin (ROAM/ROAM++: 0.681/0.680 vs MLT: 0.611) under
the AUC metric on OTB-2015.
VOT. Table 1 shows the comparison performance on
VOT-2016 and VOT-2017 datasets. Our ROAM++ achieves
the best EAO on both VOT-2016 and VOT-2017. Specially,
6723
0 5 10 15 20 25 30 35 40 45 50
Location error threshold
0
0.1
0.2
0.3
0.4
0.5
0.6
Pre
cis
ion
Precision plots of OPE
[0.445] ROAM++[0.373] MDNet[0.368] ROAM[0.360] VITAL[0.339] SiamFC[0.333] StructSiam[0.322] DSiam[0.301] ECO[0.298] STRCF[0.295] SINT[0.279] ECO_HC[0.259] CFNet
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Overlap threshold
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Su
cce
ss r
ate
Success plots of OPE
[0.447] ROAM++[0.397] MDNet[0.390] ROAM[0.390] VITAL[0.336] SiamFC[0.335] StructSiam[0.333] DSiam[0.324] ECO[0.314] SINT[0.308] STRCF[0.304] ECO_HC[0.275] CFNet
Figure 4: Precision and success plot on LaSOT test dataset
MDNet
[33]
CF2
[29]
ECO
[8]
CCOT
[9]
GOTURN
[15]
SiamFC
[4]
SiamFCv2
[44]ROAM ROAM++
AO(↑) 0.299 0.315 0.316 0.325 0.342 0.348 0.374 0.436 0.465
SR0.5(↑) 0.303 0.297 0.309 0.328 0.372 0.353 0.404 0.466 0.532
SR0.75(↑) 0.099 0.088 0.111 0.107 0.124 0.098 0.144 0.164 0.236
Table 2: Results on GOT-10k. The evaluation metrics include average
overlap (AO), success rate at 0.5 overlap threshod. (SR0.5 ), success rate
at 0.75 overlap threshod. (SR0.75 ). The top three performing trackers are
colored with red, green and blue respectively.
both our ROAM++ and ROAM shows superior performance
on robustness value compared with RPN based trackers
which have no model updating, demonstrating the effective-
ness of our recurrent model optimization scheme. In ad-
dition, our ROAM++ and ROAM outperform the baseline
MetaTracker [34] by 39.2% and 21.1% on EAO of VOT-
2016, respectively.
LaSOT. LaSOT [10] is a recently proposed large-scale
tracking dataset. We evaluate our ROAM against the top-
10 performing trackers of the benchmark [10], including
MDNet [33], VITAL [42], SiamFC [4], StructSiam [51],
DSiam [14], SINT [14], ECO [8], STRCF [24], ECO HC
[8] and CFNet [44], on the testing split which consists
of 280 videos. Fig. 4 presents the comparison results of
precision plot and success plot on LaSOT testset. Our
ROAM++ achieves the best result compared with state-of-
the-art trackers on the benchmark, outperforming the sec-
ond best MDNet with an improvement of 19.3% and 12.6%
on precision plot and success plot respectively.
GOT-10k. GOT-10k [17] is large highly-diverse dataset
for object tracking that is also proposed recently. There is
no overlap in object categories between the training set and
test set, which follows the one-shot learning setting [12].
Therefore, using external training data is strictly forbidden
when testing trackers on their online server. Following their
protocol, we train our ROAM by using only the training
split of this dataset. Table 2 shows the detailed compari-
son results on GOT-10k test dataset. Both our ROAM++
and ROAM surpass other trackers with a large margin. In
particular, our ROAM++ obtains an AO of 0.465, a SR0.5 of
0.532 and a SR0.75 of 0.236, outperforming SiamFCv2 with
an improvement of 24.3%, 31.7% and 63.9% respectively.
TrackingNet. TrackingNet [30] provides more than 30K
videos with around 14M dense bounding box annotations
by filtering shorter video clips from Youtube-BB [5]. Ta-
ble 3 presents the detailed comparison results on Track-
ingNet test dataset. Our ROAM++ surpasses other state-of-
Staple
[3]
CSRDCF
[27]
ECOhc
[8]
ECO
[8]
SiamFC
[4]
CFNet
[44]
MDNet
[33]ROAM ROAM++
AUC(↑) 0.528 0.534 0.541 0.554 0.571 0.578 0.606 0.620 0.670
Prec.(↑) 0.470 0.480 0.478 0.492 0.533 0.533 0.565 0.547 0.623
Norm. Prec.(↑) 0.603 0.622 0.608 0.618 0.663 0.654 0.705 0.695 0.754
Table 3: Results on TrackingNet. The evaluation metrics include area un-
der curve (AUC) of success plot, Precision, Normalized Precison. The top
three performing trackers are colored with red, green and blue respectively.
0 0.2 0.4 0.6 0.8 1
Overlap threshold
0
0.2
0.4
0.6
0.8
1
Success r
ate
Success plots of OPE
ROAM-oracle [0.742]SGD-oracle [0.699]ROAM [0.675]SGD [0.636]ROAM-w/o RFS [0.636]
0 0.2 0.4 0.6 0.8 1
Overlap threshold
0
0.2
0.4
0.6
0.8
1
Success r
ate
Success plots of OPE
ROAM [0.675]ROAM-GRU [0.668]ROAM-FC [0.647]ROAM-ConstLR [0.612]
Figure 5: Ablation Study on OTB-2015 with different variants of ROAM.
the-art tracking algorithms on all three evaluation metrics.
In detail, our ROAM++ obtains an improvement of 10.6%,
10.3% and 6.9% on AUC, precision and normalized pre-
cision respectively compared with top performing tracker
MDNet on the benchmark.
6.2. Ablation Study
For a deeper analysis, we study our trackers from various
aspects. Note that all these ablations are trained only on
ImageNet VID dataset for simplicity.
Impact of Different Modules. To verify the effective-
ness of different modules, we design four variants of our
framework: 1) SGD: replacing recurrent neural optimizer
with traditional SGD for model updating (using the same
number of gradient steps as ROAM); 2) ROAM-w/o RFS:
training a recurrent neural optimizer without RFS; 3) SGD-
Oracle: using the ground-truth bounding boxes to build
updating samples for SGD during the testing phase (using
the same number of gradient steps as ROAM); 4) ROAM-
Oracle: using the ground-truth bounding boxes to build up-
dating samples for ROAM during the testing phase. The
results are presented in Fig. 5 (Left). ROAM gains about
6% improvement on AUC compared with the baseline SGD,
demonstrating the effectiveness of our recurrent model op-
timization method on model updating. Without RFS dur-
ing offline training, the tracking performance drops sub-
stantially due to overfitting. ROAM-Oracle performs bet-
ter than SGD-Oracle, showing that our offline learned neu-
ral optimizer is more effective than traditional SGD method
given the same updating samples. In addition, these two or-
acles (SGD-oracle and ROAM-oracle) achieve higher AUC
score compared with their normal versions, indicating that
the tracking accuracy could be boosted by improving the
quality of the updating samples.
Architecture of Neural Optimizer. To investigate more
architectures of the neural optimizer, we presents three vari-
ants of our method: 1) ROAM-GRU: using two stacked
Gated Recurrent Unit (GRU) [6] as our neural optimizer;
2) ROAM-FC: using two linear fully-connected layers fol-
6724
0 2 4 6 8 10 12 14 16Iterations
0.58
0.61
0.64
0.67
0.7
AU
C
AUC on OTB-2015 under different gradient steps
ROAMSGD(7e-6)SGD(7e-7)
Figure 6: AUC vs. gradient steps on OTB-2015.
0
0.2
0.4
update
loss
Car24
0
0.05
0.1
0.15Deer
0
0.05
0.1
0.15Fish
0
0.1
0.2
0.3Jumping
0 2000
frame number
0
0.2
0.4
0.6
meta
loss
0 50
frame number
0
0.2
0.4
0 200 400
frame number
0
0.1
0.2
0.3
0 200
frame number
0
0.2
0.4
ROAMSGD
Figure 7: Update loss (top row) and meta loss (bottom row) comparison
between ROAM and SGD.
lowed by tanh activation function as the neural optimizer;
3) ROAM-ConstLR: using a learned constant element-wise
learning rate for model optimization instead of the adap-
tively generated one. Fig. 5 (Right) presents the re-
sults. using ROAM-GRU decreases the AUC slightly, while
ROAM-FC has significantly lower AUC compared with
ROAM, showing the importance of our recurrent structure.
Moreover, the performance drop of ROAM-ConstLR ver-
ifies the necessity of using an adaptable learning rate for
model updating.
More Steps in Updating. During offline training, we
only perform one-step gradient decent to optimize the up-
dating loss. We investigate the effect of using more than
one gradient step on tracking performance during the test
phase for both ROAM and SGD (see Fig. 6). Our method
can be further improved with more than one step, but will
gradually decrease when using too many steps. This is be-
cause our framework is not trained to perform so many steps
during the offline stage. We also use two fixed learning rates
for SGD, where the larger one is 7e-6 6 and the smaller one
is 7e-7. Using a larger learning rate, SGD could reach its
best performance much faster than using a smaller learn-
ing rate, while both have similar best AUC. Our ROAM
consistently outperforms SGD(7e-6), showing the superi-
ority of adaptive element-wise learning rates. Furthermore,
ROAM with 1-2 gradient steps outperforms SGD(7-e7) us-
ing a large number of steps, which shows the improved gen-
eralization of ROAM.
Update Loss and Meta Loss Comparison. To show
the effectiveness of our neural optimizer, we compare the
update loss and meta loss over time between ROAM and
SGD after two gradient steps for a few videos of OTB-2015
in Fig. 7 (see supplementary for more examples). Under
the same number of gradient updates, our neural optimizer
obtains lower loss compared with traditional SGD, demon-
strating its faster converge and better generalization ability
6MetaTracker uses this learning rate.
Figure 8: Histogram of initial learning rates and updating learning rates on
OTB-2015.
for model optimization.
Why does ROAM work? As discussed in [34], directly
using the learned initial learning rate λ(0) for model op-
timization in subsequent frames could lead to divergence.
This is because the learning rates for model initialization
are relatively larger than the ones needed for subsequent
frames, which therefore causes unstable model optimiza-
tion. In particular, the initial model θ(0) is offline trained
to be broadly applicable to different videos, which there-
fore needs relatively larger gradient step to adapt to a spe-
cific task, leading to a relative larger λ(0). For the subse-
quent frames, the appearance variations could be sometimes
small or sometimes large, and thus the model optimization
process needs an adaptive learning rate to handle different
situations. Fig. 8 presents the histograms of initial learn-
ing rates and updating learning rates on OTB-2015. Most
of updating learning rates are relatively small because usu-
ally there are only minor appearance variations between up-
dates. As is shown in Fig. 3, our ROAM, which performs
model updating for subsequent frames with adaptive learn-
ing rates, obtains substantial performance gain compared
with MetaTracker [34], which uses a traditional SGD with
a constant learning rate for model updating.
7. Conclusion
In this paper, we propose a novel tracking model con-
sisting of resziable response generator and bounding box
regressor, where the first part generates a score map to in-
dicate the presence of target at different spatial locations
and the second module regresses bounding box coordinates
shift to the anchors densely mounted on sliding window po-
sitions. To effectively update the tracking model for ap-
pearance adaption, we propose a recurrent model optimiza-
tion method within a meta learning setting in a few gradi-
ent steps. Instead of producing an increment for parameter
updating, which is prone to overfit due to different gradi-
ent scales, we recurrently generate adaptive learning rates
for tracking model optimization using an LSTM. Extensive
experiments on OTB, VOT, LaSOT, GOT-10k and Track-
ingNet datasets demonstrate superior performance com-
pared with state-of-the-art tracking algorithms.
Acknowledgement: This work was supported by grants
from the Research Grants Council of the Hong Kong Spe-
cial Administrative Region, China (Project No. [T32-
101/15-R] and CityU 11212518).
6725
References
[1] Marcin Andrychowicz, Misha Denil, Sergio Gomez,
Matthew W. Hoffman, David Pfau, Tom Schaul, and Nando
de Freitas. Learning to Learn by Gradient Descent by Gradi-
ent Descent. In NeurIPS, 2016. 2, 4, 5[2] Samy Bengio, Yoshua Bengio, Jocelyn Cloutier, and Jan
Gecsei. On the Optimization of a Synaptic Learning Rule. In
Preprints Conf. Optimality in Artificial and Biological Neu-
ral Networks, 1992. 2[3] Luca Bertinetto, Jack Valmadre, Stuart Golodetz, Ondrej
Miksik, and Philip Torr. Staple: Complementary Learners
for Real-Time Tracking. In CVPR, 2016. 2, 6, 7[4] Luca Bertinetto, Jack Valmadre, Joao F. Henriques, Andrea
Vedaldi, and Philip H. S. Torr. Fully-Convolutional Siamese
Networks for Object Tracking. In ECCV Workshop on VOT,
2016. 1, 2, 6, 7[5] Google Brain, Jonathon Shlens, Stefano Mazzocchi, Google
Brain, and Vincent Vanhoucke. YouTube-BoundingBoxes
: A Large High-Precision Human-Annotated Data Set for
Object Detection in Video Esteban Real bear dog airplane
zebra. In CVPR, 2017. 7[6] Kyunghyun Cho, Bart Van Merrienboer, Caglar Gulcehre,
Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and
Yoshua Bengio. Learning Phrase Representations using
RNN Encoder-decoder for Statistical Machine Translation.
arXiv preprint arXiv:1406.1078, 2014. 7[7] Janghoon Choi, Junseok Kwon, and Kyoung Mu Lee. Deep
Meta Learning for Real-Time Target-Aware Visual Tracking.
In ICCV, 2019. 6[8] Martin Danelljan, Goutam Bhat, Fahad Shahbaz Khan, and
Michael Felsberg. ECO: Efficient Convolution Operators for
Tracking. In CVPR, 2017. 1, 6, 7[9] Martin Danelljan, Andreas Robinson, Fahad Shahbaz Khan,
and Michael Felsberg. Beyond correlation filters: Learn-
ing continuous convolution operators for visual tracking. In
ECCV, 2016. 1, 2, 6, 7[10] Heng Fan, Liting Lin, Fan Yang, and Peng Chu. LaSOT
: A High-quality Benchmark for Large-scale Single Object
Tracking. In CVPR, 2019. 6, 7[11] Heng Fan and Haibin Ling. Siamese Cascaded Region Pro-
posal Networks for Real-Time Visual Tracking. In CVPR,
2019. 2, 6[12] Chelsea Finn, Pieter Abbeel, and Sergey Levine. Model-
Agnostic Meta-Learning for Fast Adaptation of Deep Net-
works. In ICML, 2017. 2, 7[13] Hamed Kiani Galoogahi, Terence Sim, and Simon Lucey.
Correlation Filters with Limited Boundaries. In CVPR, 2015.
2[14] Qing Guo, Wei Feng, Ce Zhou, Rui Huang, Liang Wan, and
Song Wang. Learning Dynamic Siamese Network for Visual
Object Tracking. In ICCV, 2017. 7[15] David Held, Sebastian Thrun, and Silvio Savarese. Learning
to Track at 100 FPS with Deep Regression Networks. In
ECCV, 2016. 7[16] Joao F. Henriques, Rui Caseiro, Pedro Martins, and Jorge
Batista. High-speed tracking with kernelized correlation fil-
ters. TPAMI, 2015. 1, 2[17] Lianghua Huang, Xin Zhao, and Kaiqi Huang. GOT-10k : A
Large High-Diversity Benchmark for Generic Object Track-
ing in the Wild. arXiv, 2019. 6, 7[18] Diederik P Kingma and Jimmy Ba. Adam: A method for
stochastic optimization. arXiv preprint arXiv:1412.6980,
2014. 6[19] Matej Kristan, Ales Leonardis, Jiri Matas, and Michael Fels-
berg. The Visual Object Tracking VOT2016 Challenge Re-
sults. In ECCV Workshop on VOT, 2016. 1, 6[20] Matej Kristan, Ales Leonardis, Jiri Matas, Michael Fels-
berg, Roman Pflugfelder, Luka Cehovin Zajc, Tomas Vojır,
Gustav Hager, Alan Lukezic, Abdelrahman Eldesokey, Gus-
tavo Fernandez, and Others. The Visual Object Tracking
VOT2017 Challenge Results. In ICCV Workshop on VOT,
2017. 1, 6[21] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton.
Imagenet classification with deep convolutional neural net-
works. In NeurIPS, 2012. 6[22] Bo Li, Wei Wu, Qiang Wang, Fangyi Zhang, Junliang Xing,
and Junjie Yan. SiamRPN++: Evolution of Siamese Visual
Tracking with Very Deep Networks. In CVPR, 2019. 2[23] Bo Li, Junjie Yan, Wei Wu, Zheng Zhu, and Xiaolin Hu.
High Performance Visual Tracking with Siamese Region
Proposal Network. In CVPR, 2018. 1, 2, 5, 6[24] Feng Li, Cheng Tian, Wangmeng Zuo, Lei Zhang, and Ming-
Hsuan Yang. Learning Spatial-Temporal Regularized Corre-
lation Filters for Visual Tracking. In CVPR, pages 4904–
4913, 2018. 7[25] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays,
Pietro Perona, Deva Ramanan, Piotr Dollar, and C Lawrence
Zitnick. Microsoft COCO: Common Objects in Context. In
ECCV, pages 740–755, 2014. 6[26] Xiankai Lu, Chao Ma, Bingbing Ni, Xiaokang Yang, Ian
Reid, and Ming-hsuan Yang. Deep Regression Tracking with
Shrinkage Loss. In ECCV, 2018. 1, 2, 6[27] Alan Lukezic, Tomas Vojı, Luka Cehovin, Jiı Matas, and
Matej Kristan. Discriminative Correlation Filter with Chan-
nel and Spatial Reliability. In CVPR, 2017. 7[28] Kaifeng Lv, Shunhua Jiang, and Jian Li. Learning Gradi-
ent Descent: Better Generalization and Longer Horizons. In
ICML, 2017. 2, 5[29] Chao Ma, Jia-bin Huang, Xiaokang Yang, and Ming-hsuan
Yang. Hierarchical Convolutional Features for Visual Track-
ing. In ICCV, 2015. 7[30] Matthias Muller, Adel Bibi, Silvio Giancola, Salman Al-
Subaihi, and Bernard Ghanem. TrackingNet: A Large-Scale
Dataset and Benchmark for Object Tracking in the Wild. In
ECCV, 2018. 6, 7[31] Tsendsuren Munkhdalai and Hong Yu. Meta Networks. In
ICML, 2017. 2[32] Devang K Naik and RJ Mammone. Meta-Neural Networks
that Learn by Learning. In Neural Networks, 1992. IJCNN.,
International Joint Conference on, 1992. 2[33] Hyeonseob Nam and Bohyung Han. Learning Multi-Domain
Convolutional Neural Networks for Visual Tracking. In
CVPR, 2016. 1, 7[34] Eunbyung Park and Alexander C. Berg. Meta-Tracker: Fast
and Robust Online Adaptation for Visual Object Trackers. In
ECCV, 2018. 2, 3, 6, 7, 8[35] Adam Paszke, Sam Gross, Soumith Chintala, Gregory
Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Al-
ban Desmaison, Luca Antiga, and Adam Lerer. Automatic
Differentiation in PyTorch. In NeurIPS Workshop, 2017. 6[36] Sachin Ravi and Hugo Larochelle. Optimization As a Model
for Few-Shot Learning. In ICLR, 2017. 2[37] Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun.
6726
Faster R-CNN: Towards Real-Time Object Detection with
Region Proposal Networks. In NeurIPS, 2015. 1, 2, 4[38] Jurgen Schmidhuber. Evolutionary Principles in Self-
referential Learning: On Learning how to Learn: the Meta-
meta-meta...-hook. PhD thesis, 1987. 2[39] John Schulman, Jonathan Ho, Xi Chen, and Pieter Abbeel.
Meta Learning Shared Hierarchies. arXiv, 2018. 2[40] Karen Simonyan and Andrew Zisserman. Very Deep Con-
volutional Networks for Large-scale Image Recognition. In
ICLR, 2015. 6[41] Yibing Song, Chao Ma, Lijun Gong, Jiawei Zhang, Rynson
Lau, and Ming-Hsuan Yang. CREST: Convolutional Resid-
ual Learning for Visual Tracking. In ICCV, 2017. 1, 2, 6[42] Yibing Song, Chao Ma, Xiaohe Wu, Lijun Gong, Linchao
Bao, Wangmeng Zuo, Chunhua Shen, Rynson Lau, and
Ming-Hsuan Yang. VITAL: VIsual Tracking via Adversarial
Learning. In CVPR, 2018. 7[43] Ran Tao, Efstratios Gavves, and Arnold W. M. Smeulders.
Siamese Instance Search for Tracking. In CVPR, 2016. 1[44] Jack Valmadre, Luca Bertinetto, F Henriques, Andrea
Vedaldi, and Philip H S Torr. End-to-end representation
learning for Correlation Filter based tracking. In CVPR,
2017. 1, 7[45] Ning Wang, Yibing Song, Wengang Zhou, Wei Liu, and
Houqiang Li. Unsupervised Deep Tracking. In CVPR, 2019.
2[46] Olga Wichrowska, Niru Maheswaranathan, Matthew W.
Hoffman, Sergio Gomez Colmenarejo, Misha Denil, Nando
de Freitas, and Jascha Sohl-Dickstein. Learned Optimizers
that Scale and Generalize. In ICML, 2017. 2, 5[47] Yi Wu, Jongwoo Lim, and Ming-Hsuan Yang. Object Track-
ing Benchmark. TPAMI, 2015. 6[48] Tianyu Yang and Antoni B. Chan. Recurrent Filter Learning
for Visual Tracking. In ICCV Workshop on VOT, 2017. 2[49] Tianyu Yang and Antoni B. Chan. Learning Dynamic Mem-
ory Networks for Object Tracking. In ECCV, 2018. 2, 6[50] Tianyu Yang and Antoni B. Chan. Visual Tracking via Dy-
namic Memory Networks. TPAMI, 2019. 2[51] Yunhua Zhang, Lijun Wang, Dong Wang, and Huchuan Lu.
Structured Siamese Network for Real-Time Visual Tracking.
In ECCV, 2018. 7[52] Zhipeng Zhang, Houwen Peng, and Qiang Wang. Deeper
and Wider Siamese Networks for Real-Time Visual Track-
ing. In CVPR, 2019. 2, 6[53] Zheng Zhu, Qiang Wang, Bo Li, Wei Wu, Junjie Yan, and
Weiming Hu. Distractor-aware Siamese Networks for Visual
Object Tracking. In ECCV, 2018. 2, 6
6727